linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach)
@ 2011-03-08 22:31 Jan Kara
  2011-03-08 22:31 ` [PATCH 1/5] writeback: account per-bdi accumulated written pages Jan Kara
                   ` (6 more replies)
  0 siblings, 7 replies; 49+ messages in thread
From: Jan Kara @ 2011-03-08 22:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, Wu Fengguang, Peter Zijlstra, Andrew Morton


  Hello,

  I'm posting second version of my IO-less balance_dirty_pages() patches. This
is alternative approach to Fengguang's patches - much simpler I believe (only
300 lines added) - but obviously I does not provide so sophisticated control.
Fengguang is currently running some tests on my patches so that we can compare
the approaches.

The basic idea (implemented in the third patch) is that processes throttled
in balance_dirty_pages() wait for enough IO to complete. The waiting is
implemented as follows: Whenever we decide to throttle a task in
balance_dirty_pages(), task adds itself to a list of tasks that are throttled
against that bdi and goes to sleep waiting to receive specified amount of page
IO completions. Once in a while (currently HZ/10, in patch 5 the interval is
autotuned based on observed IO rate), accumulated page IO completions are
distributed equally among waiting tasks.

This waiting scheme has been chosen so that waiting time in
balance_dirty_pages() is proportional to
  number_waited_pages * number_of_waiters.
In particular it does not depend on the total number of pages being waited for,
thus providing possibly a fairer results.

Since last version I've implemented cleanups as suggested by Peter Zilstra.
The patches undergone more throughout testing. So far I've tested different
filesystems (ext2, ext3, ext4, xfs, nfs), also a combination of a local
filesystem and nfs. The load was either various number of dd threads or
fio with several threads each dirtying pages at different speed.

Results and test scripts can be found at
  http://beta.suse.com/private/jack/balance_dirty_pages-v2/
See README file for some explanation of test framework, tests, and graphs.
Except for ext3 in data=ordered mode, where kjournald creates high
fluctuations in waiting time of throttled processes (and also high latencies),
the results look OK. Parallel dd threads are being throttled in the same way
(in a 2s window threads spend the same time waiting) and also latencies of
individual waits seem OK - except for ext3 they fit in 100 ms for local
filesystems. They are in 200-500 ms range for NFS, which isn't that nice but
to fix that we'd have to modify current ratelimiting scheme to take into
account on which bdi a page is dirtied. Then we could ratelimit slower BDIs
more often thus reducing latencies in individual waits...

The results for different bandwidths fio load is interesting. There are 8
threads dirtying pages at 1,2,4,..,128 MB/s rate. Due to different task
bdi dirty limits, what happens is that three most aggresive tasks get
throttled so they end up at bandwidths 24, 26, and 30 MB/s and the lighter
dirtiers run unthrottled.

I'm planning to run some tests with multiple SATA drives to verify whether
there aren't some unexpected fluctuations. But currently I have some trouble
with the HW...

As usual comments are welcome :).

								Honza

^ permalink raw reply	[flat|nested] 49+ messages in thread
* [RFC PATCH 0/5] IO-less balance dirty pages
@ 2011-02-04  1:38 Jan Kara
  2011-02-04  1:38 ` [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() Jan Kara
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Kara @ 2011-02-04  1:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm


  Hi,

  I've decided to take my stab at trying to make balance_dirty_pages() not
submit IO :). I hoped to have something simpler than Fengguang and we'll see
whether it is good enough.

The basic idea (implemented in the third patch) is that processes throttled
in balance_dirty_pages() wait for enough IO to complete. The waiting is
implemented as follows: Whenever we decide to throttle a task in
balance_dirty_pages(), task adds itself to a list of tasks that are throttled
against that bdi and goes to sleep waiting to receive specified amount of page
IO completions. Once in a while (currently HZ/10, in patch 5 the interval is
autotuned based on observed IO rate), accumulated page IO completions are
distributed equally among waiting tasks.

This waiting scheme has been chosen so that waiting time in
balance_dirty_pages() is proportional to
  number_waited_pages * number_of_waiters.
In particular it does not depend on the total number of pages being waited for,
thus providing possibly a fairer results.

I gave the patches some basic testing (multiple parallel dd's to a single
drive) and they seem to work OK. The dd's get equal share of the disk
throughput (about 10.5 MB/s, which is nice result given the disk can do
about 87 MB/s when writing single-threaded), and dirty limit does not get
exceeded. Of course much more testing needs to be done but I hope it's fine
for the first posting :).

Comments welcome.

								Honza

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2011-03-29  7:52 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-08 22:31 [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Jan Kara
2011-03-08 22:31 ` [PATCH 1/5] writeback: account per-bdi accumulated written pages Jan Kara
2011-03-08 22:31 ` [PATCH 2/5] mm: Properly reflect task dirty limits in dirty_exceeded logic Jan Kara
2011-03-09 21:02   ` Vivek Goyal
2011-03-14 20:44     ` Jan Kara
2011-03-15 15:21       ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() Jan Kara
2011-03-10  0:07   ` Vivek Goyal
2011-03-14 20:48     ` Jan Kara
2011-03-15 15:23       ` Vivek Goyal
2011-03-16 21:26         ` Curt Wohlgemuth
2011-03-16 22:53           ` Curt Wohlgemuth
2011-03-16 16:53   ` Vivek Goyal
2011-03-16 19:10     ` Jan Kara
2011-03-16 19:31       ` Vivek Goyal
2011-03-16 19:58         ` Jan Kara
2011-03-16 20:22           ` Vivek Goyal
2011-03-08 22:31 ` [PATCH 4/5] mm: Remove low limit from sync_writeback_pages() Jan Kara
2011-03-08 22:31 ` [PATCH 5/5] mm: Autotune interval between distribution of page completions Jan Kara
2011-03-17 15:46 ` [PATCH RFC 0/5] IO-less balance_dirty_pages() v2 (simple approach) Curt Wohlgemuth
2011-03-17 15:51   ` Christoph Hellwig
2011-03-17 16:24     ` Curt Wohlgemuth
2011-03-17 16:43       ` Christoph Hellwig
2011-03-17 17:32   ` Jan Kara
2011-03-17 18:55     ` Curt Wohlgemuth
2011-03-17 22:56       ` Vivek Goyal
2011-03-18 14:30 ` Wu Fengguang
2011-03-22 21:43   ` Jan Kara
2011-03-23  4:41     ` Dave Chinner
2011-03-25 12:59       ` Wu Fengguang
2011-03-25 13:44     ` Wu Fengguang
2011-03-25 23:05       ` Jan Kara
2011-03-28  2:44         ` Wu Fengguang
2011-03-28 15:08           ` Jan Kara
2011-03-29  1:44             ` Wu Fengguang
2011-03-29  2:14           ` Dave Chinner
2011-03-29  2:41             ` Wu Fengguang
2011-03-29  5:59               ` Dave Chinner
2011-03-29  7:31                 ` Wu Fengguang
2011-03-29  7:52                   ` Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2011-02-04  1:38 [RFC PATCH 0/5] IO-less balance dirty pages Jan Kara
2011-02-04  1:38 ` [PATCH 3/5] mm: Implement IO-less balance_dirty_pages() Jan Kara
2011-02-04 13:09   ` Peter Zijlstra
2011-02-11 14:56     ` Jan Kara
2011-02-04 13:09   ` Peter Zijlstra
2011-02-04 13:19     ` Peter Zijlstra
2011-02-11 15:46     ` Jan Kara
2011-02-22 15:40       ` Peter Zijlstra
2011-02-04 13:09   ` Peter Zijlstra
2011-02-11 14:56     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).