linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bdi: use deferable timer for sync_supers task
@ 2010-10-08  8:35 Yong Wang
  2010-10-08  9:25 ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Yong Wang @ 2010-10-08  8:35 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Artem Bityutskiy, Wu Fengguang
  Cc: linux-kernel, linux-mm, xia.wu

sync_supers task currently wakes up periodically for superblock
writeback. This hurts power on battery driven devices. This patch
turns this housekeeping timer into a deferable timer so that it
does not fire when system is really idle.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Signed-off-by: Xia Wu <xia.wu@intel.com>
---
 mm/backing-dev.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 65d4204..9a8daa5 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -238,7 +238,9 @@ static int __init default_bdi_init(void)
 	sync_supers_tsk = kthread_run(bdi_sync_supers, NULL, "sync_supers");
 	BUG_ON(IS_ERR(sync_supers_tsk));
 
-	setup_timer(&sync_supers_timer, sync_supers_timer_fn, 0);
+	init_timer_deferrable(&sync_supers_timer);
+	sync_supers_timer.function = sync_supers_timer_fn;
+	sync_supers_timer.data = 0;
 	bdi_arm_supers_timer();
 
 	err = bdi_init(&default_backing_dev_info);
-- 
1.5.5.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08  8:35 [PATCH] bdi: use deferable timer for sync_supers task Yong Wang
@ 2010-10-08  9:25 ` Christoph Hellwig
  2010-10-08 10:02   ` Artem Bityutskiy
  2010-10-08 10:04   ` Wu, Xia
  0 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2010-10-08  9:25 UTC (permalink / raw)
  To: Yong Wang
  Cc: Jens Axboe, Christoph Hellwig, Artem Bityutskiy, Wu Fengguang,
	linux-kernel, linux-mm, xia.wu

On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote:
> sync_supers task currently wakes up periodically for superblock
> writeback. This hurts power on battery driven devices. This patch
> turns this housekeeping timer into a deferable timer so that it
> does not fire when system is really idle.

How long can the timer be defereed?  We can't simply stop writing
out data for a long time.  I think the current timer value should be
the upper bound, but allowing to fire earlier to run during the
same wakeup cycle as others is fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08  9:25 ` Christoph Hellwig
@ 2010-10-08 10:02   ` Artem Bityutskiy
  2010-10-08 10:04   ` Wu, Xia
  1 sibling, 0 replies; 11+ messages in thread
From: Artem Bityutskiy @ 2010-10-08 10:02 UTC (permalink / raw)
  To: ext Christoph Hellwig
  Cc: Yong Wang, Jens Axboe, Wu Fengguang, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, xia.wu@intel.com

On Fri, 2010-10-08 at 11:25 +0200, ext Christoph Hellwig wrote:
> On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote:
> > sync_supers task currently wakes up periodically for superblock
> > writeback. This hurts power on battery driven devices. This patch
> > turns this housekeeping timer into a deferable timer so that it
> > does not fire when system is really idle.
> 
> How long can the timer be defereed?  We can't simply stop writing
> out data for a long time.  I think the current timer value should be
> the upper bound, but allowing to fire earlier to run during the
> same wakeup cycle as others is fine.

Infinitely.

There are range hrtimers which can do exactly what you said - you
specify the hard and soft limits there.

-- 
Best Regards,
Artem Bityutskiy (D?N?N?N?D 1/4  D?D,N?N?N?DoD,D1)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08  9:25 ` Christoph Hellwig
  2010-10-08 10:02   ` Artem Bityutskiy
@ 2010-10-08 10:04   ` Wu, Xia
  2010-10-08 10:09     ` Artem Bityutskiy
  1 sibling, 1 reply; 11+ messages in thread
From: Wu, Xia @ 2010-10-08 10:04 UTC (permalink / raw)
  To: Christoph Hellwig, Yong Wang
  Cc: Jens Axboe, Artem Bityutskiy, Wu, Fengguang,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org


On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote:
> > sync_supers task currently wakes up periodically for superblock
> > writeback. This hurts power on battery driven devices. This patch
> > turns this housekeeping timer into a deferable timer so that it
> > does not fire when system is really idle.

> How long can the timer be defereed?  We can't simply stop writing
> out data for a long time.  I think the current timer value should be
> the upper bound, but allowing to fire earlier to run during the
> same wakeup cycle as others is fine.

If the system is in sleep state, this timer can be deferred to the next wake-up interrupt.
If the system is busy, this timer will fire at the scheduled time.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08 10:04   ` Wu, Xia
@ 2010-10-08 10:09     ` Artem Bityutskiy
  2010-10-08 10:27       ` Wu, Xia
  0 siblings, 1 reply; 11+ messages in thread
From: Artem Bityutskiy @ 2010-10-08 10:09 UTC (permalink / raw)
  To: ext Wu, Xia
  Cc: Christoph Hellwig, Yong Wang, Jens Axboe, Wu, Fengguang,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On Fri, 2010-10-08 at 12:04 +0200, ext Wu, Xia wrote:
> On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote:
> > > sync_supers task currently wakes up periodically for superblock
> > > writeback. This hurts power on battery driven devices. This patch
> > > turns this housekeeping timer into a deferable timer so that it
> > > does not fire when system is really idle.
> 
> > How long can the timer be defereed?  We can't simply stop writing
> > out data for a long time.  I think the current timer value should be
> > the upper bound, but allowing to fire earlier to run during the
> > same wakeup cycle as others is fine.
> 
> If the system is in sleep state, this timer can be deferred to the next wake-up interrupt.
> If the system is busy, this timer will fire at the scheduled time.

However, when the next wake-up interrupt happens is not defined. It can
happen 1ms after, or 1 minute after, or 1 hour after. What Christoph
says is that there should be some guarantee that sb writeout starts,
say, within 5 to 10 seconds interval. Deferrable timers do not guarantee
this. But take a look at the range hrtimers - they do exactly this.

-- 
Best Regards,
Artem Bityutskiy (D?N?N?N?D 1/4  D?D,N?N?N?DoD,D1)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08 10:09     ` Artem Bityutskiy
@ 2010-10-08 10:27       ` Wu, Xia
  2010-10-08 10:28         ` Artem Bityutskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Wu, Xia @ 2010-10-08 10:27 UTC (permalink / raw)
  To: Artem.Bityutskiy@nokia.com
  Cc: Christoph Hellwig, Yong Wang, Jens Axboe, Wu, Fengguang,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org


> On Fri, 2010-10-08 at 12:04 +0200, ext Wu, Xia wrote:
> > On Fri, Oct 08, 2010 at 04:35:14PM +0800, Yong Wang wrote:
> > > > sync_supers task currently wakes up periodically for superblock
> > > > writeback. This hurts power on battery driven devices. This patch
> > > > turns this housekeeping timer into a deferable timer so that it
> > > > does not fire when system is really idle.
> >
> > > How long can the timer be defereed?  We can't simply stop writing
> > > out data for a long time.  I think the current timer value should be
> > > the upper bound, but allowing to fire earlier to run during the
> > > same wakeup cycle as others is fine.
> >
> > If the system is in sleep state, this timer can be deferred to the next wake-up interrupt.
> > If the system is busy, this timer will fire at the scheduled time.

> However, when the next wake-up interrupt happens is not defined. It can
> happen 1ms after, or 1 minute after, or 1 hour after. What Christoph
> says is that there should be some guarantee that sb writeout starts,
> say, within 5 to 10 seconds interval. Deferrable timers do not guarantee
> this. But take a look at the range hrtimers - they do exactly this.

If the system is in sleep state, is there any data which should be written? Must 
sb writeout start even there isn't any data? 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08 10:28         ` Artem Bityutskiy
@ 2010-10-08 10:27           ` Yong Wang
  2010-10-08 13:57             ` Wu Fengguang
  2010-10-08 13:59             ` Artem Bityutskiy
  0 siblings, 2 replies; 11+ messages in thread
From: Yong Wang @ 2010-10-08 10:27 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Wu, Xia, Christoph Hellwig, Jens Axboe, Wu, Fengguang,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote:
> On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote:
> > > However, when the next wake-up interrupt happens is not defined. It can
> > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph
> > > says is that there should be some guarantee that sb writeout starts,
> > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee
> > > this. But take a look at the range hrtimers - they do exactly this.
> > 
> > If the system is in sleep state, is there any data which should be written?
> 
> May be yes, may be no.
> 

Thanks for the quick response, Artem. May I know what might need to be
written out when system is really idle?

-Yong

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08 10:27       ` Wu, Xia
@ 2010-10-08 10:28         ` Artem Bityutskiy
  2010-10-08 10:27           ` Yong Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Artem Bityutskiy @ 2010-10-08 10:28 UTC (permalink / raw)
  To: Wu, Xia
  Cc: Christoph Hellwig, Yong Wang, Jens Axboe, Wu, Fengguang,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote:
> > However, when the next wake-up interrupt happens is not defined. It can
> > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph
> > says is that there should be some guarantee that sb writeout starts,
> > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee
> > this. But take a look at the range hrtimers - they do exactly this.
> 
> If the system is in sleep state, is there any data which should be written?

May be yes, may be no.

>  Must 
> sb writeout start even there isn't any data? 

No.

-- 
Best Regards,
Artem Bityutskiy (D?N?N?N?D 1/4  D?D,N?N?N?DoD,D1)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08 10:27           ` Yong Wang
@ 2010-10-08 13:57             ` Wu Fengguang
  2010-10-08 14:42               ` Wu Fengguang
  2010-10-08 13:59             ` Artem Bityutskiy
  1 sibling, 1 reply; 11+ messages in thread
From: Wu Fengguang @ 2010-10-08 13:57 UTC (permalink / raw)
  To: Yong Wang
  Cc: Artem Bityutskiy, Wu, Xia, Christoph Hellwig, Jens Axboe,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On Fri, Oct 08, 2010 at 06:27:09PM +0800, Yong Wang wrote:
> On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote:
> > On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote:
> > > > However, when the next wake-up interrupt happens is not defined. It can
> > > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph
> > > > says is that there should be some guarantee that sb writeout starts,
> > > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee
> > > > this. But take a look at the range hrtimers - they do exactly this.
> > > 
> > > If the system is in sleep state, is there any data which should be written?
> > 
> > May be yes, may be no.
> > 
> 
> Thanks for the quick response, Artem. May I know what might need to be
> written out when system is really idle?

system idle != no dirty inodes

Imagine an application dirties 100MB data and quits. The system then
goes quiet for very long time. In this case we still want the flusher
thread to wake up within 30 seconds to flush the 100MB dirty data.
It's a contract that dirty data will be synced to disk after 30s
(which is the default value of /proc/sys/vm/dirty_expire_centisecs).

Note that 30s is not an exact value. A dirty page may be synced to
disk when it's been dirtied for 35s. The 5s error comes from the
flusher wakeup interval (/proc/sys/vm/dirty_writeback_centisecs).

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08 10:27           ` Yong Wang
  2010-10-08 13:57             ` Wu Fengguang
@ 2010-10-08 13:59             ` Artem Bityutskiy
  1 sibling, 0 replies; 11+ messages in thread
From: Artem Bityutskiy @ 2010-10-08 13:59 UTC (permalink / raw)
  To: Yong Wang
  Cc: Wu, Xia, Christoph Hellwig, Jens Axboe, Wu, Fengguang,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On Fri, 2010-10-08 at 18:27 +0800, Yong Wang wrote:
> On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote:
> > On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote:
> > > > However, when the next wake-up interrupt happens is not defined. It can
> > > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph
> > > > says is that there should be some guarantee that sb writeout starts,
> > > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee
> > > > this. But take a look at the range hrtimers - they do exactly this.
> > > 
> > > If the system is in sleep state, is there any data which should be written?
> > 
> > May be yes, may be no.
> > 
> 
> Thanks for the quick response, Artem. May I know what might need to be
> written out when system is really idle?

I do not understand the question. There is dirty data, and it should be
flushed within some time interval.

Anyway, to make the long story short, I made an attempt to optimize this
and stop arming the timer when we have no dirty data. But my solution
was not accepted and Al asked me to just get rid of this timer and whole
sync_supers(). He said this should be pushed down to individual FSes. I
guess the idea is that

1) some FSes actually abuse sb synching, e.g., JFFS2.
2) other FSes can eventually optimize things for themselves.

But I did not find time to do this so far.

-- 
Best Regards,
Artem Bityutskiy (D?N?N?N?D 1/4  D?D,N?N?N?DoD,D1)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] bdi: use deferable timer for sync_supers task
  2010-10-08 13:57             ` Wu Fengguang
@ 2010-10-08 14:42               ` Wu Fengguang
  0 siblings, 0 replies; 11+ messages in thread
From: Wu Fengguang @ 2010-10-08 14:42 UTC (permalink / raw)
  To: Yong Wang
  Cc: Artem Bityutskiy, Wu, Xia, Christoph Hellwig, Jens Axboe,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

On Fri, Oct 08, 2010 at 09:57:04PM +0800, Wu Fengguang wrote:
> On Fri, Oct 08, 2010 at 06:27:09PM +0800, Yong Wang wrote:
> > On Fri, Oct 08, 2010 at 01:28:07PM +0300, Artem Bityutskiy wrote:
> > > On Fri, 2010-10-08 at 18:27 +0800, Wu, Xia wrote:
> > > > > However, when the next wake-up interrupt happens is not defined. It can
> > > > > happen 1ms after, or 1 minute after, or 1 hour after. What Christoph
> > > > > says is that there should be some guarantee that sb writeout starts,
> > > > > say, within 5 to 10 seconds interval. Deferrable timers do not guarantee
> > > > > this. But take a look at the range hrtimers - they do exactly this.
> > > > 
> > > > If the system is in sleep state, is there any data which should be written?
> > > 
> > > May be yes, may be no.
> > > 
> > 
> > Thanks for the quick response, Artem. May I know what might need to be
> > written out when system is really idle?
> 
> system idle != no dirty inodes

Ah sorry -- I missed the context. Please ignore the following
paragraphs for sync_supers..

Thanks,
Fengguang

> Imagine an application dirties 100MB data and quits. The system then
> goes quiet for very long time. In this case we still want the flusher
> thread to wake up within 30 seconds to flush the 100MB dirty data.
> It's a contract that dirty data will be synced to disk after 30s
> (which is the default value of /proc/sys/vm/dirty_expire_centisecs).
> 
> Note that 30s is not an exact value. A dirty page may be synced to
> disk when it's been dirtied for 35s. The 5s error comes from the
> flusher wakeup interval (/proc/sys/vm/dirty_writeback_centisecs).
> 
> Thanks,
> Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-10-08 14:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-08  8:35 [PATCH] bdi: use deferable timer for sync_supers task Yong Wang
2010-10-08  9:25 ` Christoph Hellwig
2010-10-08 10:02   ` Artem Bityutskiy
2010-10-08 10:04   ` Wu, Xia
2010-10-08 10:09     ` Artem Bityutskiy
2010-10-08 10:27       ` Wu, Xia
2010-10-08 10:28         ` Artem Bityutskiy
2010-10-08 10:27           ` Yong Wang
2010-10-08 13:57             ` Wu Fengguang
2010-10-08 14:42               ` Wu Fengguang
2010-10-08 13:59             ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).