* [PATCH] bdi: use deferable timer for sync_supers task @ 2010-10-08 8:35 Yong Wang 2010-10-08 9:25 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Yong Wang @ 2010-10-08 8:35 UTC (permalink / raw) To: Jens Axboe, Christoph Hellwig, Artem Bityutskiy, Wu Fengguang Cc: linux-kernel, linux-mm, xia.wu sync_supers task currently wakes up periodically for superblock writeback. This hurts power on battery driven devices. This patch turns this housekeeping timer into a deferable timer so that it does not fire when system is really idle. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Signed-off-by: Xia Wu <xia.wu@intel.com> --- mm/backing-dev.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 65d4204..9a8daa5 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -238,7 +238,9 @@ static int __init default_bdi_init(void) sync_supers_tsk = kthread_run(bdi_sync_supers, NULL, "sync_supers"); BUG_ON(IS_ERR(sync_supers_tsk)); - setup_timer(&sync_supers_timer, sync_supers_timer_fn, 0); + init_timer_deferrable(&sync_supers_timer); + sync_supers_timer.function = sync_supers_timer_fn; + sync_supers_timer.data = 0; bdi_arm_supers_timer(); err = bdi_init(&default_backing_dev_info); -- 1.5.5.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 8:35 [PATCH] bdi: use deferable timer for sync_supers task Yong Wang @ 2010-10-08 9:25 ` Christoph Hellwig 2010-10-08 10:02 ` Artem Bityutskiy 2010-10-08 10:04 ` Wu, Xia 0 siblings, 2 replies; 11+ messages in thread From: Christoph Hellwig @ 2010-10-08 9:25 UTC (permalink / raw) To: Yong Wang Cc: Jens Axboe, Christoph Hellwig, Artem Bityutskiy, Wu Fengguang, linux-kernel, linux-mm, xia.wu On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote: > sync_supers task currently wakes up periodically for superblock > writeback. This hurts power on battery driven devices. This patch > turns this housekeeping timer into a deferable timer so that it > does not fire when system is really idle. How long can the timer be defereed? We can't simply stop writing out data for a long time. I think the current timer value should be the upper bound, but allowing to fire earlier to run during the same wakeup cycle as others is fine. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 9:25 ` Christoph Hellwig @ 2010-10-08 10:02 ` Artem Bityutskiy 2010-10-08 10:04 ` Wu, Xia 1 sibling, 0 replies; 11+ messages in thread From: Artem Bityutskiy @ 2010-10-08 10:02 UTC (permalink / raw) To: ext Christoph Hellwig Cc: Yong Wang, Jens Axboe, Wu Fengguang, linux-kernel@vger.kernel.org, linux-mm@kvack.org, xia.wu@intel.com On Fri, 2010-10-08 at 11:25 +0200, ext Christoph Hellwig wrote: > On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote: > > sync_supers task currently wakes up periodically for superblock > > writeback. This hurts power on battery driven devices. This patch > > turns this housekeeping timer into a deferable timer so that it > > does not fire when system is really idle. > > How long can the timer be defereed? We can't simply stop writing > out data for a long time. I think the current timer value should be > the upper bound, but allowing to fire earlier to run during the > same wakeup cycle as others is fine. Infinitely. There are range hrtimers which can do exactly what you said - you specify the hard and soft limits there. -- Best Regards, Artem Bityutskiy (D?N?N?N?D 1/4 D?D,N?N?N?DoD,D1) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 9:25 ` Christoph Hellwig 2010-10-08 10:02 ` Artem Bityutskiy @ 2010-10-08 10:04 ` Wu, Xia 2010-10-08 10:09 ` Artem Bityutskiy 1 sibling, 1 reply; 11+ messages in thread From: Wu, Xia @ 2010-10-08 10:04 UTC (permalink / raw) To: Christoph Hellwig, Yong Wang Cc: Jens Axboe, Artem Bityutskiy, Wu, Fengguang, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote: > > sync_supers task currently wakes up periodically for superblock > > writeback. This hurts power on battery driven devices. This patch > > turns this housekeeping timer into a deferable timer so that it > > does not fire when system is really idle. > How long can the timer be defereed? We can't simply stop writing > out data for a long time. I think the current timer value should be > the upper bound, but allowing to fire earlier to run during the > same wakeup cycle as others is fine. If the system is in sleep state, this timer can be deferred to the next wake-up interrupt. If the system is busy, this timer will fire at the scheduled time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 10:04 ` Wu, Xia @ 2010-10-08 10:09 ` Artem Bityutskiy 2010-10-08 10:27 ` Wu, Xia 0 siblings, 1 reply; 11+ messages in thread From: Artem Bityutskiy @ 2010-10-08 10:09 UTC (permalink / raw) To: ext Wu, Xia Cc: Christoph Hellwig, Yong Wang, Jens Axboe, Wu, Fengguang, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, 2010-10-08 at 12:04 +0200, ext Wu, Xia wrote: > On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote: > > > sync_supers task currently wakes up periodically for superblock > > > writeback. This hurts power on battery driven devices. This patch > > > turns this housekeeping timer into a deferable timer so that it > > > does not fire when system is really idle. > > > How long can the timer be defereed? We can't simply stop writing > > out data for a long time. I think the current timer value should be > > the upper bound, but allowing to fire earlier to run during the > > same wakeup cycle as others is fine. > > If the system is in sleep state, this timer can be deferred to the next wake-up interrupt. > If the system is busy, this timer will fire at the scheduled time. However, when the next wake-up interrupt happens is not defined. It can happen 1ms after, or 1 minute after, or 1 hour after. What Christoph says is that there should be some guarantee that sb writeout starts, say, within 5 to 10 seconds interval. Deferrable timers do not guarantee this. But take a look at the range hrtimers - they do exactly this. -- Best Regards, Artem Bityutskiy (D?N?N?N?D 1/4 D?D,N?N?N?DoD,D1) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 10:09 ` Artem Bityutskiy @ 2010-10-08 10:27 ` Wu, Xia 2010-10-08 10:28 ` Artem Bityutskiy 0 siblings, 1 reply; 11+ messages in thread From: Wu, Xia @ 2010-10-08 10:27 UTC (permalink / raw) To: Artem.Bityutskiy@nokia.com Cc: Christoph Hellwig, Yong Wang, Jens Axboe, Wu, Fengguang, linux-kernel@vger.kernel.org, linux-mm@kvack.org > On Fri, 2010-10-08 at 12:04 +0200, ext Wu, Xia wrote: > > On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote: > > > > sync_supers task currently wakes up periodically for superblock > > > > writeback. This hurts power on battery driven devices. This patch > > > > turns this housekeeping timer into a deferable timer so that it > > > > does not fire when system is really idle. > > > > > How long can the timer be defereed? We can't simply stop writing > > > out data for a long time. I think the current timer value should be > > > the upper bound, but allowing to fire earlier to run during the > > > same wakeup cycle as others is fine. > > > > If the system is in sleep state, this timer can be deferred to the next wake-up interrupt. > > If the system is busy, this timer will fire at the scheduled time. > However, when the next wake-up interrupt happens is not defined. It can > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph > says is that there should be some guarantee that sb writeout starts, > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee > this. But take a look at the range hrtimers - they do exactly this. If the system is in sleep state, is there any data which should be written? Must sb writeout start even there isn't any data? ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 10:27 ` Wu, Xia @ 2010-10-08 10:28 ` Artem Bityutskiy 2010-10-08 10:27 ` Yong Wang 0 siblings, 1 reply; 11+ messages in thread From: Artem Bityutskiy @ 2010-10-08 10:28 UTC (permalink / raw) To: Wu, Xia Cc: Christoph Hellwig, Yong Wang, Jens Axboe, Wu, Fengguang, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote: > > However, when the next wake-up interrupt happens is not defined. It can > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph > > says is that there should be some guarantee that sb writeout starts, > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee > > this. But take a look at the range hrtimers - they do exactly this. > > If the system is in sleep state, is there any data which should be written? May be yes, may be no. > Must > sb writeout start even there isn't any data? No. -- Best Regards, Artem Bityutskiy (D?N?N?N?D 1/4 D?D,N?N?N?DoD,D1) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 10:28 ` Artem Bityutskiy @ 2010-10-08 10:27 ` Yong Wang 2010-10-08 13:57 ` Wu Fengguang 2010-10-08 13:59 ` Artem Bityutskiy 0 siblings, 2 replies; 11+ messages in thread From: Yong Wang @ 2010-10-08 10:27 UTC (permalink / raw) To: Artem Bityutskiy Cc: Wu, Xia, Christoph Hellwig, Jens Axboe, Wu, Fengguang, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote: > On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote: > > > However, when the next wake-up interrupt happens is not defined. It can > > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph > > > says is that there should be some guarantee that sb writeout starts, > > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee > > > this. But take a look at the range hrtimers - they do exactly this. > > > > If the system is in sleep state, is there any data which should be written? > > May be yes, may be no. > Thanks for the quick response, Artem. May I know what might need to be written out when system is really idle? -Yong -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 10:27 ` Yong Wang @ 2010-10-08 13:57 ` Wu Fengguang 2010-10-08 14:42 ` Wu Fengguang 2010-10-08 13:59 ` Artem Bityutskiy 1 sibling, 1 reply; 11+ messages in thread From: Wu Fengguang @ 2010-10-08 13:57 UTC (permalink / raw) To: Yong Wang Cc: Artem Bityutskiy, Wu, Xia, Christoph Hellwig, Jens Axboe, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Oct 08, 2010 at 06:27:09PM +0800, Yong Wang wrote: > On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote: > > On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote: > > > > However, when the next wake-up interrupt happens is not defined. It can > > > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph > > > > says is that there should be some guarantee that sb writeout starts, > > > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee > > > > this. But take a look at the range hrtimers - they do exactly this. > > > > > > If the system is in sleep state, is there any data which should be written? > > > > May be yes, may be no. > > > > Thanks for the quick response, Artem. May I know what might need to be > written out when system is really idle? system idle != no dirty inodes Imagine an application dirties 100MB data and quits. The system then goes quiet for very long time. In this case we still want the flusher thread to wake up within 30 seconds to flush the 100MB dirty data. It's a contract that dirty data will be synced to disk after 30s (which is the default value of /proc/sys/vm/dirty_expire_centisecs). Note that 30s is not an exact value. A dirty page may be synced to disk when it's been dirtied for 35s. The 5s error comes from the flusher wakeup interval (/proc/sys/vm/dirty_writeback_centisecs). Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 13:57 ` Wu Fengguang @ 2010-10-08 14:42 ` Wu Fengguang 0 siblings, 0 replies; 11+ messages in thread From: Wu Fengguang @ 2010-10-08 14:42 UTC (permalink / raw) To: Yong Wang Cc: Artem Bityutskiy, Wu, Xia, Christoph Hellwig, Jens Axboe, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Oct 08, 2010 at 09:57:04PM +0800, Wu Fengguang wrote: > On Fri, Oct 08, 2010 at 06:27:09PM +0800, Yong Wang wrote: > > On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote: > > > On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote: > > > > > However, when the next wake-up interrupt happens is not defined. It can > > > > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph > > > > > says is that there should be some guarantee that sb writeout starts, > > > > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee > > > > > this. But take a look at the range hrtimers - they do exactly this. > > > > > > > > If the system is in sleep state, is there any data which should be written? > > > > > > May be yes, may be no. > > > > > > > Thanks for the quick response, Artem. May I know what might need to be > > written out when system is really idle? > > system idle != no dirty inodes Ah sorry -- I missed the context. Please ignore the following paragraphs for sync_supers.. Thanks, Fengguang > Imagine an application dirties 100MB data and quits. The system then > goes quiet for very long time. In this case we still want the flusher > thread to wake up within 30 seconds to flush the 100MB dirty data. > It's a contract that dirty data will be synced to disk after 30s > (which is the default value of /proc/sys/vm/dirty_expire_centisecs). > > Note that 30s is not an exact value. A dirty page may be synced to > disk when it's been dirtied for 35s. The 5s error comes from the > flusher wakeup interval (/proc/sys/vm/dirty_writeback_centisecs). > > Thanks, > Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] bdi: use deferable timer for sync_supers task 2010-10-08 10:27 ` Yong Wang 2010-10-08 13:57 ` Wu Fengguang @ 2010-10-08 13:59 ` Artem Bityutskiy 1 sibling, 0 replies; 11+ messages in thread From: Artem Bityutskiy @ 2010-10-08 13:59 UTC (permalink / raw) To: Yong Wang Cc: Wu, Xia, Christoph Hellwig, Jens Axboe, Wu, Fengguang, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, 2010-10-08 at 18:27 +0800, Yong Wang wrote: > On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote: > > On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote: > > > > However, when the next wake-up interrupt happens is not defined. It can > > > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph > > > > says is that there should be some guarantee that sb writeout starts, > > > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee > > > > this. But take a look at the range hrtimers - they do exactly this. > > > > > > If the system is in sleep state, is there any data which should be written? > > > > May be yes, may be no. > > > > Thanks for the quick response, Artem. May I know what might need to be > written out when system is really idle? I do not understand the question. There is dirty data, and it should be flushed within some time interval. Anyway, to make the long story short, I made an attempt to optimize this and stop arming the timer when we have no dirty data. But my solution was not accepted and Al asked me to just get rid of this timer and whole sync_supers(). He said this should be pushed down to individual FSes. I guess the idea is that 1) some FSes actually abuse sb synching, e.g., JFFS2. 2) other FSes can eventually optimize things for themselves. But I did not find time to do this so far. -- Best Regards, Artem Bityutskiy (D?N?N?N?D 1/4 D?D,N?N?N?DoD,D1) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-10-08 14:42 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-08 8:35 [PATCH] bdi: use deferable timer for sync_supers task Yong Wang 2010-10-08 9:25 ` Christoph Hellwig 2010-10-08 10:02 ` Artem Bityutskiy 2010-10-08 10:04 ` Wu, Xia 2010-10-08 10:09 ` Artem Bityutskiy 2010-10-08 10:27 ` Wu, Xia 2010-10-08 10:28 ` Artem Bityutskiy 2010-10-08 10:27 ` Yong Wang 2010-10-08 13:57 ` Wu Fengguang 2010-10-08 14:42 ` Wu Fengguang 2010-10-08 13:59 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).