* [PATCH-v2 0/2] lazytime bug fixes for 4.0 @ 2015-03-16 19:14 Theodore Ts'o 2015-03-16 19:14 ` [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written Theodore Ts'o 2015-03-16 19:14 ` [PATCH-v2 2/2] fs: add dirtytime_expire_seconds sysctl Theodore Ts'o 0 siblings, 2 replies; 9+ messages in thread From: Theodore Ts'o @ 2015-03-16 19:14 UTC (permalink / raw) To: Linux Filesystem Development List; +Cc: jack, torvalds, viro, Theodore Ts'o Apologies for the delay, but Vault and LSF/MM kept me busy last week. These patches fix the issues which Jan pointed out. They aren't serious; this fixes ab issue which might result in some timestamps would end up becoming stale until the file system unmounted or syncfs(2) is run on it. Still, it's nice to fix them so the guarantee is as they have been documented and as people would expect. Theodore Ts'o (2): fs: make sure the timestamps for lazytime inodes eventually get written fs: add dirtytime_expire_seconds sysctl fs/fs-writeback.c | 93 ++++++++++++++++++++++++++++++++++++++++++----- include/linux/fs.h | 1 + include/linux/writeback.h | 3 ++ kernel/sysctl.c | 8 ++++ 4 files changed, 95 insertions(+), 10 deletions(-) -- 2.3.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written 2015-03-16 19:14 [PATCH-v2 0/2] lazytime bug fixes for 4.0 Theodore Ts'o @ 2015-03-16 19:14 ` Theodore Ts'o 2015-03-16 21:34 ` Andreas Dilger 2015-03-17 10:29 ` Jan Kara 2015-03-16 19:14 ` [PATCH-v2 2/2] fs: add dirtytime_expire_seconds sysctl Theodore Ts'o 1 sibling, 2 replies; 9+ messages in thread From: Theodore Ts'o @ 2015-03-16 19:14 UTC (permalink / raw) To: Linux Filesystem Development List Cc: jack, torvalds, viro, Theodore Ts'o, stable Jan Kara pointed out that if there is an inode which is constantly getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp will never be written since inode->dirtied_when is constantly getting updated. We fix this by adding an extra field to the inode, dirtied_time_when, so inodes with a stale dirtytime can get detected and handled. In addition, if we have a dirtytime inode caused by an atime update, and there is no write activity on the file system, we need to have a secondary system to make sure these inodes get written out. We do this by setting up a second delayed work structure which wakes up the CPU much more rarely compared to writeback_expire_centisecs. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org --- fs/fs-writeback.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++------- include/linux/fs.h | 1 + 2 files changed, 73 insertions(+), 10 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e907052..ae13fba 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -53,6 +53,18 @@ struct wb_writeback_work { struct completion *done; /* set if the caller waits */ }; +/* + * If an inode is constantly having its pages dirtied, but then the + * updates stop dirtytime_expire_interval seconds in the past, it's + * possible for the worst case time between when an inode has its + * timestamps updated and when they finally get written out to be two + * dirtytime_expire_intervals. We set the default to 12 hours (in + * seconds), which means most of the time inodes will have their + * timestamps written to disk after 12 hours, but in the worst case a + * few inodes might not their timestamps updated for 24 hours. + */ +unsigned int dirtytime_expire_interval = 12 * 60 * 60; + /** * writeback_in_progress - determine whether there is writeback in progress * @bdi: the device's backing_dev_info structure. @@ -275,8 +287,8 @@ static int move_expired_inodes(struct list_head *delaying_queue, if ((flags & EXPIRE_DIRTY_ATIME) == 0) older_than_this = work->older_than_this; - else if ((work->reason == WB_REASON_SYNC) == 0) { - expire_time = jiffies - (HZ * 86400); + else if (!work->for_sync) { + expire_time = jiffies - (dirtytime_expire_interval * HZ); older_than_this = &expire_time; } while (!list_empty(delaying_queue)) { @@ -458,6 +470,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb, */ redirty_tail(inode, wb); } else if (inode->i_state & I_DIRTY_TIME) { + inode->dirtied_when = jiffies; list_move(&inode->i_wb_list, &wb->b_dirty_time); } else { /* The inode is clean. Remove from writeback lists. */ @@ -505,12 +518,17 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) spin_lock(&inode->i_lock); dirty = inode->i_state & I_DIRTY; - if (((dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) && - (inode->i_state & I_DIRTY_TIME)) || - (inode->i_state & I_DIRTY_TIME_EXPIRED)) { - dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED; - trace_writeback_lazytime(inode); - } + if (inode->i_state & I_DIRTY_TIME) { + if ((dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) || + unlikely(inode->i_state & I_DIRTY_TIME_EXPIRED) || + unlikely(time_after((inode->dirtied_time_when + + dirtytime_expire_interval * HZ), + jiffies))) { + dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED; + trace_writeback_lazytime(inode); + } + } else + inode->i_state &= ~I_DIRTY_TIME_EXPIRED; inode->i_state &= ~dirty; /* @@ -1131,6 +1149,45 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) rcu_read_unlock(); } +/* + * Wake up bdi's periodically to make sure dirtytime inodes gets + * written back periodically. We deliberately do *not* check the + * b_dirtytime list in wb_has_dirty_io(), since this would cause the + * kernel to be constantly waking up once there are any dirtytime + * inodes on the system. So instead we define a separate delayed work + * function which gets called much more rarely. (By default, only + * once every 12 hours.) + * + * If there is any other write activity going on in the file system, + * this function won't be necessary. But if the only thing that has + * happened on the file system is a dirtytime inode caused by an atime + * update, we need this infrastructure below to make sure that inode + * eventually gets pushed out to disk. + */ +static void wakeup_dirtytime_writeback(struct work_struct *w); +static DECLARE_DELAYED_WORK(dirtytime_work, wakeup_dirtytime_writeback); + +static void wakeup_dirtytime_writeback(struct work_struct *w) +{ + struct backing_dev_info *bdi; + + rcu_read_lock(); + list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { + if (list_empty(&bdi->wb.b_dirty_time)) + continue; + bdi_wakeup_thread(bdi); + } + rcu_read_unlock(); + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); +} + +static int __init start_dirtytime_writeback(void) +{ + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); + return 0; +} +__initcall(start_dirtytime_writeback); + static noinline void block_dump___mark_inode_dirty(struct inode *inode) { if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) { @@ -1269,8 +1326,13 @@ void __mark_inode_dirty(struct inode *inode, int flags) } inode->dirtied_when = jiffies; - list_move(&inode->i_wb_list, dirtytime ? - &bdi->wb.b_dirty_time : &bdi->wb.b_dirty); + if (dirtytime) + inode->dirtied_time_when = jiffies; + if (inode->i_state & (I_DIRTY_INODE | I_DIRTY_PAGES)) + list_move(&inode->i_wb_list, &bdi->wb.b_dirty); + else + list_move(&inode->i_wb_list, + &bdi->wb.b_dirty_time); spin_unlock(&bdi->wb.list_lock); trace_writeback_dirty_inode_enqueue(inode); diff --git a/include/linux/fs.h b/include/linux/fs.h index b4d71b5..f4131e8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -604,6 +604,7 @@ struct inode { struct mutex i_mutex; unsigned long dirtied_when; /* jiffies of first dirtying */ + unsigned long dirtied_time_when; struct hlist_node i_hash; struct list_head i_wb_list; /* backing dev IO list */ -- 2.3.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written 2015-03-16 19:14 ` [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written Theodore Ts'o @ 2015-03-16 21:34 ` Andreas Dilger 2015-03-17 10:33 ` Jan Kara 2015-03-17 15:09 ` Theodore Ts'o 2015-03-17 10:29 ` Jan Kara 1 sibling, 2 replies; 9+ messages in thread From: Andreas Dilger @ 2015-03-16 21:34 UTC (permalink / raw) To: Theodore Ts'o Cc: Linux Filesystem Development List, jack, torvalds, viro, stable On Mar 16, 2015, at 1:14 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > Jan Kara pointed out that if there is an inode which is constantly > getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp > will never be written since inode->dirtied_when is constantly getting > updated. We fix this by adding an extra field to the inode, > dirtied_time_when, so inodes with a stale dirtytime can get detected > and handled. The drawback here is that this adds another 8 bytes to every inode for a field of marginal value, since this is only important for the rare case of a file that is being dirtied continuously. I wonder if something more lightweight could be added to avoid this problem? For example, we only care about this case if it has been going on for more than the lazytime interval (about a day), so the inode could store a 16-bit i_dirtied_time_when that is approximately (jiffies >> bits_in_a_half_a_day) and only check time_after() that. The __u16 could fit into some existing hole (e.g. after i_bytes on my kernel) and avoid expanding the size of the inode at all. The remaining high bits of i_dirtied_time_when would be irrelevant, since a __u16 of half-days is about 80 years, so it would be enough to compare: time_after(i_dirtied_time_when, (__u16)(jiffies >> bits_in_half_a_day)) A day is 86400s, so 43200s is close to (1 << 22) jiffies for HZ=100, and (1 << 25) jiffies is about 3/8 of a day for HZ=1000. Since the exact times for inode writeout don't matter very much here, having only shifts to convert jiffies to i_dirtied_time_when in the kernel is better I think. Minor issue, is there a good reason why dirtied_time_when doesn't have an "i_" prefix? Cheers, Andreas > In addition, if we have a dirtytime inode caused by an atime update, > and there is no write activity on the file system, we need to have a > secondary system to make sure these inodes get written out. We do > this by setting up a second delayed work structure which wakes up the > CPU much more rarely compared to writeback_expire_centisecs. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > Cc: stable@vger.kernel.org > --- > fs/fs-writeback.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++------- > include/linux/fs.h | 1 + > 2 files changed, 73 insertions(+), 10 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index e907052..ae13fba 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -53,6 +53,18 @@ struct wb_writeback_work { > struct completion *done; /* set if the caller waits */ > }; > > +/* > + * If an inode is constantly having its pages dirtied, but then the > + * updates stop dirtytime_expire_interval seconds in the past, it's > + * possible for the worst case time between when an inode has its > + * timestamps updated and when they finally get written out to be two > + * dirtytime_expire_intervals. We set the default to 12 hours (in > + * seconds), which means most of the time inodes will have their > + * timestamps written to disk after 12 hours, but in the worst case a > + * few inodes might not their timestamps updated for 24 hours. > + */ > +unsigned int dirtytime_expire_interval = 12 * 60 * 60; > + > /** > * writeback_in_progress - determine whether there is writeback in progress > * @bdi: the device's backing_dev_info structure. > @@ -275,8 +287,8 @@ static int move_expired_inodes(struct list_head *delaying_queue, > > if ((flags & EXPIRE_DIRTY_ATIME) == 0) > older_than_this = work->older_than_this; > - else if ((work->reason == WB_REASON_SYNC) == 0) { > - expire_time = jiffies - (HZ * 86400); > + else if (!work->for_sync) { > + expire_time = jiffies - (dirtytime_expire_interval * HZ); > older_than_this = &expire_time; > } > while (!list_empty(delaying_queue)) { > @@ -458,6 +470,7 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb, > */ > redirty_tail(inode, wb); > } else if (inode->i_state & I_DIRTY_TIME) { > + inode->dirtied_when = jiffies; > list_move(&inode->i_wb_list, &wb->b_dirty_time); > } else { > /* The inode is clean. Remove from writeback lists. */ > @@ -505,12 +518,17 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > spin_lock(&inode->i_lock); > > dirty = inode->i_state & I_DIRTY; > - if (((dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) && > - (inode->i_state & I_DIRTY_TIME)) || > - (inode->i_state & I_DIRTY_TIME_EXPIRED)) { > - dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED; > - trace_writeback_lazytime(inode); > - } > + if (inode->i_state & I_DIRTY_TIME) { > + if ((dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) || > + unlikely(inode->i_state & I_DIRTY_TIME_EXPIRED) || > + unlikely(time_after((inode->dirtied_time_when + > + dirtytime_expire_interval * HZ), > + jiffies))) { > + dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED; > + trace_writeback_lazytime(inode); > + } > + } else > + inode->i_state &= ~I_DIRTY_TIME_EXPIRED; > inode->i_state &= ~dirty; > > /* > @@ -1131,6 +1149,45 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) > rcu_read_unlock(); > } > > +/* > + * Wake up bdi's periodically to make sure dirtytime inodes gets > + * written back periodically. We deliberately do *not* check the > + * b_dirtytime list in wb_has_dirty_io(), since this would cause the > + * kernel to be constantly waking up once there are any dirtytime > + * inodes on the system. So instead we define a separate delayed work > + * function which gets called much more rarely. (By default, only > + * once every 12 hours.) > + * > + * If there is any other write activity going on in the file system, > + * this function won't be necessary. But if the only thing that has > + * happened on the file system is a dirtytime inode caused by an atime > + * update, we need this infrastructure below to make sure that inode > + * eventually gets pushed out to disk. > + */ > +static void wakeup_dirtytime_writeback(struct work_struct *w); > +static DECLARE_DELAYED_WORK(dirtytime_work, wakeup_dirtytime_writeback); > + > +static void wakeup_dirtytime_writeback(struct work_struct *w) > +{ > + struct backing_dev_info *bdi; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { > + if (list_empty(&bdi->wb.b_dirty_time)) > + continue; > + bdi_wakeup_thread(bdi); > + } > + rcu_read_unlock(); > + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); > +} > + > +static int __init start_dirtytime_writeback(void) > +{ > + schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ); > + return 0; > +} > +__initcall(start_dirtytime_writeback); > + > static noinline void block_dump___mark_inode_dirty(struct inode *inode) > { > if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) { > @@ -1269,8 +1326,13 @@ void __mark_inode_dirty(struct inode *inode, int flags) > } > > inode->dirtied_when = jiffies; > - list_move(&inode->i_wb_list, dirtytime ? > - &bdi->wb.b_dirty_time : &bdi->wb.b_dirty); > + if (dirtytime) > + inode->dirtied_time_when = jiffies; > + if (inode->i_state & (I_DIRTY_INODE | I_DIRTY_PAGES)) > + list_move(&inode->i_wb_list, &bdi->wb.b_dirty); > + else > + list_move(&inode->i_wb_list, > + &bdi->wb.b_dirty_time); > spin_unlock(&bdi->wb.list_lock); > trace_writeback_dirty_inode_enqueue(inode); > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index b4d71b5..f4131e8 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -604,6 +604,7 @@ struct inode { > struct mutex i_mutex; > > unsigned long dirtied_when; /* jiffies of first dirtying */ > + unsigned long dirtied_time_when; > > struct hlist_node i_hash; > struct list_head i_wb_list; /* backing dev IO list */ > -- > 2.3.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written 2015-03-16 21:34 ` Andreas Dilger @ 2015-03-17 10:33 ` Jan Kara 2015-03-17 15:53 ` Theodore Ts'o 2015-03-17 15:09 ` Theodore Ts'o 1 sibling, 1 reply; 9+ messages in thread From: Jan Kara @ 2015-03-17 10:33 UTC (permalink / raw) To: Andreas Dilger Cc: Theodore Ts'o, Linux Filesystem Development List, jack, torvalds, viro, stable On Mon 16-03-15 15:34:12, Andreas Dilger wrote: > On Mar 16, 2015, at 1:14 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > > > Jan Kara pointed out that if there is an inode which is constantly > > getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp > > will never be written since inode->dirtied_when is constantly getting > > updated. We fix this by adding an extra field to the inode, > > dirtied_time_when, so inodes with a stale dirtytime can get detected > > and handled. > > The drawback here is that this adds another 8 bytes to every inode for > a field of marginal value, since this is only important for the rare > case of a file that is being dirtied continuously. Yes. > I wonder if something more lightweight could be added to avoid this > problem? For example, we only care about this case if it has been > going on for more than the lazytime interval (about a day), so the > inode could store a 16-bit i_dirtied_time_when that is approximately > (jiffies >> bits_in_a_half_a_day) and only check time_after() that. > The __u16 could fit into some existing hole (e.g. after i_bytes on my > kernel) and avoid expanding the size of the inode at all. > > The remaining high bits of i_dirtied_time_when would be irrelevant, since > a __u16 of half-days is about 80 years, so it would be enough to compare: > > > time_after(i_dirtied_time_when, (__u16)(jiffies >> bits_in_half_a_day)) > > > A day is 86400s, so 43200s is close to (1 << 22) jiffies for HZ=100, and > (1 << 25) jiffies is about 3/8 of a day for HZ=1000. Since the exact > times for inode writeout don't matter very much here, having only shifts > to convert jiffies to i_dirtied_time_when in the kernel is better I think. Yes, something like this should be possible. But I wanted that to happen as a separate patch once we have everything working correctly. The code is subtle enough that I didn't want Ted to complicate it with further optimizations initially. > Minor issue, is there a good reason why dirtied_time_when doesn't have an > "i_" prefix? I guess it's matching with dirtied_when which doesn't have i_ prefix just because noone added it initially. I don't really care either way. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written 2015-03-17 10:33 ` Jan Kara @ 2015-03-17 15:53 ` Theodore Ts'o 0 siblings, 0 replies; 9+ messages in thread From: Theodore Ts'o @ 2015-03-17 15:53 UTC (permalink / raw) To: Jan Kara Cc: Andreas Dilger, Linux Filesystem Development List, torvalds, viro, stable On Tue, Mar 17, 2015 at 11:33:37AM +0100, Jan Kara wrote: > Yes, something like this should be possible. But I wanted that to happen > as a separate patch once we have everything working correctly. The code is > subtle enough that I didn't want Ted to complicate it with further > optimizations initially. Yes, agreed, I took a closer look at it and making the change that Andreas suggested is more complicated than it first seems. We can try to fix this later, but it's definitely messy. - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written 2015-03-16 21:34 ` Andreas Dilger 2015-03-17 10:33 ` Jan Kara @ 2015-03-17 15:09 ` Theodore Ts'o 1 sibling, 0 replies; 9+ messages in thread From: Theodore Ts'o @ 2015-03-17 15:09 UTC (permalink / raw) To: Andreas Dilger Cc: Linux Filesystem Development List, jack, torvalds, viro, stable On Mon, Mar 16, 2015 at 03:34:12PM -0600, Andreas Dilger wrote: > I wonder if something more lightweight could be added to avoid this > problem? For example, we only care about this case if it has been > going on for more than the lazytime interval (about a day), so the > inode could store a 16-bit i_dirtied_time_when that is approximately > (jiffies >> bits_in_a_half_a_day) and only check time_after() that. > The __u16 could fit into some existing hole (e.g. after i_bytes on my > kernel) and avoid expanding the size of the inode at all. > > The remaining high bits of i_dirtied_time_when would be irrelevant, since > a __u16 of half-days is about 80 years, so it would be enough to compare: > > > time_after(i_dirtied_time_when, (__u16)(jiffies >> bits_in_half_a_day)) That won't work correctly; we'd have to do something like this #define u16_after(a,b) (typecheck(__u16, a) && typecheck(__u16, b) && \ ((__s16)((b) - (a)) < 0)) > Minor issue, is there a good reason why dirtied_time_when doesn't have an > "i_" prefix? It's because dirtied_when also doesn't have an i_ prefix, but arguably it should. - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written 2015-03-16 19:14 ` [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written Theodore Ts'o 2015-03-16 21:34 ` Andreas Dilger @ 2015-03-17 10:29 ` Jan Kara 1 sibling, 0 replies; 9+ messages in thread From: Jan Kara @ 2015-03-17 10:29 UTC (permalink / raw) To: Theodore Ts'o Cc: Linux Filesystem Development List, jack, torvalds, viro, stable On Mon 16-03-15 15:14:19, Ted Tso wrote: > Jan Kara pointed out that if there is an inode which is constantly > getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp > will never be written since inode->dirtied_when is constantly getting > updated. We fix this by adding an extra field to the inode, > dirtied_time_when, so inodes with a stale dirtytime can get detected > and handled. > > In addition, if we have a dirtytime inode caused by an atime update, > and there is no write activity on the file system, we need to have a > secondary system to make sure these inodes get written out. We do > this by setting up a second delayed work structure which wakes up the > CPU much more rarely compared to writeback_expire_centisecs. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > Cc: stable@vger.kernel.org Since this was merged in rc1, I don't think CC to stable is needed. > @@ -505,12 +518,17 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > spin_lock(&inode->i_lock); > > dirty = inode->i_state & I_DIRTY; > - if (((dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) && > - (inode->i_state & I_DIRTY_TIME)) || > - (inode->i_state & I_DIRTY_TIME_EXPIRED)) { > - dirty |= I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED; > - trace_writeback_lazytime(inode); > - } > + if (inode->i_state & I_DIRTY_TIME) { > + if ((dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) || > + unlikely(inode->i_state & I_DIRTY_TIME_EXPIRED) || > + unlikely(time_after((inode->dirtied_time_when + > + dirtytime_expire_interval * HZ), > + jiffies))) { The time comparison is the other way around, isn't it? After fixing that feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH-v2 2/2] fs: add dirtytime_expire_seconds sysctl 2015-03-16 19:14 [PATCH-v2 0/2] lazytime bug fixes for 4.0 Theodore Ts'o 2015-03-16 19:14 ` [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written Theodore Ts'o @ 2015-03-16 19:14 ` Theodore Ts'o 2015-03-17 10:30 ` Jan Kara 1 sibling, 1 reply; 9+ messages in thread From: Theodore Ts'o @ 2015-03-16 19:14 UTC (permalink / raw) To: Linux Filesystem Development List Cc: jack, torvalds, viro, Theodore Ts'o, stable Add a tuning knob so we can adjust the dirtytime expiration timeout, which is very useful for testing lazytime. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org --- fs/fs-writeback.c | 11 +++++++++++ include/linux/writeback.h | 3 +++ kernel/sysctl.c | 8 ++++++++ 3 files changed, 22 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index ae13fba..d6fa722 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1188,6 +1188,17 @@ static int __init start_dirtytime_writeback(void) } __initcall(start_dirtytime_writeback); +int dirtytime_interval_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int ret; + + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (ret == 0 && write) + mod_delayed_work(system_wq, &dirtytime_work, 0); + return ret; +} + static noinline void block_dump___mark_inode_dirty(struct inode *inode) { if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) { diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 0004833..b2dd371e 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -130,6 +130,7 @@ extern int vm_dirty_ratio; extern unsigned long vm_dirty_bytes; extern unsigned int dirty_writeback_interval; extern unsigned int dirty_expire_interval; +extern unsigned int dirtytime_expire_interval; extern int vm_highmem_is_dirtyable; extern int block_dump; extern int laptop_mode; @@ -146,6 +147,8 @@ extern int dirty_ratio_handler(struct ctl_table *table, int write, extern int dirty_bytes_handler(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +int dirtytime_interval_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos); struct ctl_table; int dirty_writeback_centisecs_handler(struct ctl_table *, int, diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 88ea2d6..ce410bb 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1228,6 +1228,14 @@ static struct ctl_table vm_table[] = { .extra1 = &zero, }, { + .procname = "dirtytime_expire_seconds", + .data = &dirtytime_expire_interval, + .maxlen = sizeof(dirty_expire_interval), + .mode = 0644, + .proc_handler = dirtytime_interval_handler, + .extra1 = &zero, + }, + { .procname = "nr_pdflush_threads", .mode = 0444 /* read-only */, .proc_handler = pdflush_proc_obsolete, -- 2.3.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH-v2 2/2] fs: add dirtytime_expire_seconds sysctl 2015-03-16 19:14 ` [PATCH-v2 2/2] fs: add dirtytime_expire_seconds sysctl Theodore Ts'o @ 2015-03-17 10:30 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2015-03-17 10:30 UTC (permalink / raw) To: Theodore Ts'o Cc: Linux Filesystem Development List, jack, torvalds, viro, stable On Mon 16-03-15 15:14:20, Ted Tso wrote: > Add a tuning knob so we can adjust the dirtytime expiration timeout, > which is very useful for testing lazytime. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > Cc: stable@vger.kernel.org CC to stable isn't needed. Otherwise: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/fs-writeback.c | 11 +++++++++++ > include/linux/writeback.h | 3 +++ > kernel/sysctl.c | 8 ++++++++ > 3 files changed, 22 insertions(+) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index ae13fba..d6fa722 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1188,6 +1188,17 @@ static int __init start_dirtytime_writeback(void) > } > __initcall(start_dirtytime_writeback); > > +int dirtytime_interval_handler(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, loff_t *ppos) > +{ > + int ret; > + > + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); > + if (ret == 0 && write) > + mod_delayed_work(system_wq, &dirtytime_work, 0); > + return ret; > +} > + > static noinline void block_dump___mark_inode_dirty(struct inode *inode) > { > if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) { > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index 0004833..b2dd371e 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -130,6 +130,7 @@ extern int vm_dirty_ratio; > extern unsigned long vm_dirty_bytes; > extern unsigned int dirty_writeback_interval; > extern unsigned int dirty_expire_interval; > +extern unsigned int dirtytime_expire_interval; > extern int vm_highmem_is_dirtyable; > extern int block_dump; > extern int laptop_mode; > @@ -146,6 +147,8 @@ extern int dirty_ratio_handler(struct ctl_table *table, int write, > extern int dirty_bytes_handler(struct ctl_table *table, int write, > void __user *buffer, size_t *lenp, > loff_t *ppos); > +int dirtytime_interval_handler(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, loff_t *ppos); > > struct ctl_table; > int dirty_writeback_centisecs_handler(struct ctl_table *, int, > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 88ea2d6..ce410bb 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1228,6 +1228,14 @@ static struct ctl_table vm_table[] = { > .extra1 = &zero, > }, > { > + .procname = "dirtytime_expire_seconds", > + .data = &dirtytime_expire_interval, > + .maxlen = sizeof(dirty_expire_interval), > + .mode = 0644, > + .proc_handler = dirtytime_interval_handler, > + .extra1 = &zero, > + }, > + { > .procname = "nr_pdflush_threads", > .mode = 0444 /* read-only */, > .proc_handler = pdflush_proc_obsolete, > -- > 2.3.0 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-03-17 15:53 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-16 19:14 [PATCH-v2 0/2] lazytime bug fixes for 4.0 Theodore Ts'o 2015-03-16 19:14 ` [PATCH-v2 1/2] fs: make sure the timestamps for lazytime inodes eventually get written Theodore Ts'o 2015-03-16 21:34 ` Andreas Dilger 2015-03-17 10:33 ` Jan Kara 2015-03-17 15:53 ` Theodore Ts'o 2015-03-17 15:09 ` Theodore Ts'o 2015-03-17 10:29 ` Jan Kara 2015-03-16 19:14 ` [PATCH-v2 2/2] fs: add dirtytime_expire_seconds sysctl Theodore Ts'o 2015-03-17 10:30 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).