* Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option [not found] ` <20141128181421.GA19461@google.com> @ 2014-12-02 12:58 ` Jan Kara 2014-12-02 17:55 ` Boaz Harrosh 0 siblings, 1 reply; 5+ messages in thread From: Jan Kara @ 2014-12-02 12:58 UTC (permalink / raw) To: Ted Ts'o; +Cc: linux-fsdevel, linux-ext4, Jan Kara, linux-btrfs, xfs On Fri 28-11-14 13:14:21, Ted Tso wrote: > On Fri, Nov 28, 2014 at 06:23:23PM +0100, Jan Kara wrote: > > Hum, when someone calls fsync() for an inode, you likely want to sync > > timestamps to disk even if everything else is clean. I think that doing > > what you did in last version: > > dirty = inode->i_state & I_DIRTY_INODE; > > inode->i_state &= ~I_DIRTY_INODE; > > spin_unlock(&inode->i_lock); > > if (dirty & I_DIRTY_TIME) > > mark_inode_dirty_sync(inode); > > looks better to me. IMO when someone calls __writeback_single_inode() we > > should write whatever we have... > > Yes, but we also have to distinguish between what happens on an > fsync() versus what happens on a periodic writeback if I_DIRTY_PAGES > (but not I_DIRTY_SYNC or I_DIRTY_DATASYNC) is set. So there is a > check in the fsync() code path to handle the concern you raised above. Ah, this is the thing you have been likely talking about but which I was constantly missing in my thoughts. You don't want to write times when inode has only dirty pages and timestamps - I was always thinking about a situation where inode has only dirty timestamps and not pages. This situation also complicates the writeback logic because when inode has dirty pages, you need to track it as normal dirty inode for page writeback (with dirtied_when correspoding to time when pages were dirtied) but in parallel you now need to track the information that inode has timestamps that weren't written for X long. And even if we stored how old are timestamps it isn't easily possible to keep the list of inodes with just dirty timestamps sorted by dirty time. So now I finally understand why you did things the way you did them... Sorry for misleading you. So let's restart the design so that things are clear: 1) We have new inode bit I_DIRTY_TIME. This means that only timestamps in the inode have changed. The desired behavior is that inode is with I_DIRTY_TIME and without I_DIRTY_SYNC | I_DIRTY_DATASYNC is written by background writeback only once per 24 hours. Such inodes do get written by sync(2) and fsync(2) calls. 2) Inodes with only I_DIRTY_TIME are tracked in a new dirty list b_dirty_time. We use i_wb_list list head for this. Unlike b_dirty list, this list isn't kept sorted by dirtied_when. If queue_io() sees for_sync bit set in the work item, it will call mark_inode_dirty_sync() for all inodes in b_dirty_time before queuing io from b_dirty list. Once per hour (or something like that) flusher thread scans the whole b_dirty_time list and calls mark_inode_dirty_sync() for all inodes that have too old dirty timestamps (to detect this we need a new time stamp in the inode). 3) When fsync() sees inode with I_DIRTY_TIME set, it calls mark_inode_dirty_sync(). 4) When we are dropping last inode reference and inode has I_DIRTY_TIME set, we call mark_inode_dirty_sync(). And that should be it, right? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option 2014-12-02 12:58 ` [PATCH-v5 1/5] vfs: add support for a lazytime mount option Jan Kara @ 2014-12-02 17:55 ` Boaz Harrosh 2014-12-02 19:23 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Boaz Harrosh @ 2014-12-02 17:55 UTC (permalink / raw) To: Jan Kara, Ted Ts'o; +Cc: linux-fsdevel, linux-ext4, linux-btrfs, xfs On 12/02/2014 02:58 PM, Jan Kara wrote: > On Fri 28-11-14 13:14:21, Ted Tso wrote: >> On Fri, Nov 28, 2014 at 06:23:23PM +0100, Jan Kara wrote: >>> Hum, when someone calls fsync() for an inode, you likely want to sync >>> timestamps to disk even if everything else is clean. I think that doing >>> what you did in last version: >>> dirty = inode->i_state & I_DIRTY_INODE; >>> inode->i_state &= ~I_DIRTY_INODE; >>> spin_unlock(&inode->i_lock); >>> if (dirty & I_DIRTY_TIME) >>> mark_inode_dirty_sync(inode); >>> looks better to me. IMO when someone calls __writeback_single_inode() we >>> should write whatever we have... >> >> Yes, but we also have to distinguish between what happens on an >> fsync() versus what happens on a periodic writeback if I_DIRTY_PAGES >> (but not I_DIRTY_SYNC or I_DIRTY_DATASYNC) is set. So there is a >> check in the fsync() code path to handle the concern you raised above. > Ah, this is the thing you have been likely talking about but which I was > constantly missing in my thoughts. You don't want to write times when inode > has only dirty pages and timestamps - This I do not understand. I thought that I_DIRTY_TIME, and the all lazytime mount option, is only for atime. So if there are dirty pages then there are also m/ctime that changed and surly we want to write these times to disk ASAP. if we are lazytime also with m/ctime then I think I would like an option for only atime lazy. because m/ctime is cardinal to some operations even though I might want atime lazy. Sorry for the slowness, I'm probably missing something Thanks Boaz _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option 2014-12-02 17:55 ` Boaz Harrosh @ 2014-12-02 19:23 ` Theodore Ts'o 2014-12-02 20:37 ` Andreas Dilger 0 siblings, 1 reply; 5+ messages in thread From: Theodore Ts'o @ 2014-12-02 19:23 UTC (permalink / raw) To: Boaz Harrosh; +Cc: linux-fsdevel, linux-ext4, Jan Kara, linux-btrfs, xfs On Tue, Dec 02, 2014 at 07:55:48PM +0200, Boaz Harrosh wrote: > > This I do not understand. I thought that I_DIRTY_TIME, and the all > lazytime mount option, is only for atime. So if there are dirty > pages then there are also m/ctime that changed and surly we want to > write these times to disk ASAP. What are the situations where you are most concerned about mtime or ctime being accurate after a crash? I've been running with it on my laptop for a while now, and it's certainly not a problem for build trees; remember, whenever you need to update the inode to update i_blocks or i_size, the inode (with its updated timestamps) will be flushed to disk anyway. In actual practice, what happens in a build tree is that when make decides that it needs to update a generated file, when the file is created as a zero-length inode, m/ctime will be set to the time that file is created, which is newer than its source files. As the file is written, the mtime is updated each time that we actually need to do an allocating write. In the case of the linker, it will seek to the beginning of the file to update ELF header at the very end of its operation, and *that* time will be left stale, such that the in-memory mtime is perhaps a millisecond ahead of the on-disk mtime. But in the case of a crash, either time is such that make won't be confused. I'm not aware of an application which is doing a large number of non-allocating random writes (for example, such as a database), where said database actually cares about mtime being correct. In fact, most databases use fdatasync() to prevent the mtimes from being sync'ed out to disk on each transaction, so they don't have guaranteed timestamp accuracy after a crash anyway. The problem is even if the database is using fdatasync(), every five seconds we end up updating the mtime anyway --- and in the case of ext4, we end up needing to take various journal locks which on a sufficiently parallel workload and a sufficiently fast disk, can actually cause measurable contention. Did you have such a use case or application in mind? > if we are lazytime also with m/ctime then I think I would like an > option for only atime lazy. because m/ctime is cardinal to some > operations even though I might want atime lazy. If there's a sufficiently compelling use case where we do actually care about mtime/ctime being accurate, and the current semantics don't provide enough of a guarantee, it's certainly something we could do. I'd rather keep things simple unless it's really there. (After all, we did create the strictatime mount option, but I'm not sure anyone every ends up using it. It woud be a shame if we created a strictcmtime, which had the same usage rate.) I'll also note that if it's only about atime updates, with the default relatime mount option, I'm not sure there's enough of a win to hae a mode to justify a lazyatime only option. If you really neeed strict c/mtime after a crash, maybe the best thing to do is to just simply not use the lazytime mount option and be done with it. Cheeres, - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option 2014-12-02 19:23 ` Theodore Ts'o @ 2014-12-02 20:37 ` Andreas Dilger 2014-12-02 21:01 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Andreas Dilger @ 2014-12-02 20:37 UTC (permalink / raw) To: Theodore Ts'o Cc: Boaz Harrosh, Jan Kara, xfs, linux-fsdevel, linux-ext4, linux-btrfs On Dec 2, 2014, at 12:23 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Tue, Dec 02, 2014 at 07:55:48PM +0200, Boaz Harrosh wrote: >> >> This I do not understand. I thought that I_DIRTY_TIME, and the all >> lazytime mount option, is only for atime. So if there are dirty >> pages then there are also m/ctime that changed and surly we want to >> write these times to disk ASAP. > > What are the situations where you are most concerned about mtime or > ctime being accurate after a crash? > > I've been running with it on my laptop for a while now, and it's > certainly not a problem for build trees; remember, whenever you need > to update the inode to update i_blocks or i_size, the inode (with its > updated timestamps) will be flushed to disk anyway. [snip] > I'm not aware of an application which is doing a large number of > non-allocating random writes (for example, such as a database), where > said database actually cares about mtime being correct. [snip] > Did you have such a use case or application in mind? One thing that comes to mind is touch/utimes()/utimensat(). Those should definitely not result in timestamps being kept only in memory for 24h, since the whole point of those calls is to update the times. It makes sense for these APIs to dirty the inode for proper writeout. Cheers, Andreas >> if we are lazytime also with m/ctime then I think I would like an >> option for only atime lazy. because m/ctime is cardinal to some >> operations even though I might want atime lazy. > > If there's a sufficiently compelling use case where we do actually > care about mtime/ctime being accurate, and the current semantics don't > provide enough of a guarantee, it's certainly something we could do. > I'd rather keep things simple unless it's really there. (After all, > we did create the strictatime mount option, but I'm not sure anyone > every ends up using it. It woud be a shame if we created a > strictcmtime, which had the same usage rate.) > > I'll also note that if it's only about atime updates, with the default > relatime mount option, I'm not sure there's enough of a win to hae a > mode to justify a lazyatime only option. If you really neeed strict > c/mtime after a crash, maybe the best thing to do is to just simply > not use the lazytime mount option and be done with it. > > Cheeres, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option 2014-12-02 20:37 ` Andreas Dilger @ 2014-12-02 21:01 ` Theodore Ts'o 0 siblings, 0 replies; 5+ messages in thread From: Theodore Ts'o @ 2014-12-02 21:01 UTC (permalink / raw) To: Andreas Dilger Cc: Boaz Harrosh, Jan Kara, xfs, linux-fsdevel, linux-ext4, linux-btrfs On Tue, Dec 02, 2014 at 01:37:27PM -0700, Andreas Dilger wrote: > > One thing that comes to mind is touch/utimes()/utimensat(). Those > should definitely not result in timestamps being kept only in memory > for 24h, since the whole point of those calls is to update the times. > It makes sense for these APIs to dirty the inode for proper writeout. Not a problem. Touch/utimes* go through notify_change() and ->setattr, so they won't go through the I_DIRTY_TIME code path. - Ted _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-12-02 21:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1417154411-5367-1-git-send-email-tytso@mit.edu>
[not found] ` <1417154411-5367-2-git-send-email-tytso@mit.edu>
[not found] ` <20141128172323.GD738@quack.suse.cz>
[not found] ` <20141128181421.GA19461@google.com>
2014-12-02 12:58 ` [PATCH-v5 1/5] vfs: add support for a lazytime mount option Jan Kara
2014-12-02 17:55 ` Boaz Harrosh
2014-12-02 19:23 ` Theodore Ts'o
2014-12-02 20:37 ` Andreas Dilger
2014-12-02 21:01 ` Theodore Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox