cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 3/3 v3 block/for-4.2/core] writeback: implement foreign cgroup inode bdi_writeback switching
@ 2015-05-22 22:36 Tejun Heo
  2015-05-22 22:36 ` [PATCH 1/9] writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() Tejun Heo
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Tejun Heo @ 2015-05-22 22:36 UTC (permalink / raw)
  To: axboe
  Cc: linux-kernel, jack, hch, hannes, linux-fsdevel, vgoyal, lizefan,
	cgroups, linux-mm, mhocko, clm, fengguang.wu, david, gthelen,
	khlebnikov

Hello,

The changes from the last take[L] are

* Rebased on top of block/for-4.2/core.

* 0004-truncate-swap-the-order-of-conditionals-in-cancel_di.patch
  became unnecessary due to recent changes to cancel_page_dirty().
  Dropped.

* unlocked_inode_to_wb_begin/end() usages were using the wrong locking
  order when used in combination with memcg stat transactions.  Orders
  reversed and might_lock() added to
  0007-writeback-add-lockdep-annotation-to-inode_to_wb.patch so that
  bugs like this can be caught reliably.

The previous two patchsets [2][3] implemented cgroup writeback support
and backpressure propagation through dirty throttling mechanism;
however, the inode is assigned to the wb (bdi_writeback) matching the
first dirtied page and stays there until released.  This first-use
policy can easily lead to gross misbehaviors - a single stray dirty
page can cause gigatbytes to be written by the wrong cgroup.  Also,
while concurrently write sharing an inode is extremely rare and
unsupported, inodes jumping cgroups over time are more common.

This patchset implements foreign cgroup inode detection and wb
switching.  Each writeback run tracks the majority wb being written
using a simple but fairly robust algorithm and when an inode
persistently writes out more foreign cgroup pages than local ones, the
inode is transferred to the majority winner.

This patchset adds 8 bytes to inode making the total per-inode space
overhead of cgroup writeback support 16 bytes on 64bit systems.  The
computational overhead should be negligible.  If the writer changes
from one cgroup to another entirely, the mechanism can render the
correct switch verdict in several seconds of IO time in most cases and
it can converge on the correct answer in reasonable amount of time
even in more ambiguous cases.

This patchset contains the following 8 patches.

 0001-writeback-relocate-wb-_try-_get-wb_put-inode_-attach.patch
 0002-writeback-make-writeback_control-track-the-inode-bei.patch
 0003-writeback-implement-foreign-cgroup-inode-detection.patch
 0004-writeback-implement-locked_-inode_to_wb_and_lock_lis.patch
 0005-writeback-implement-unlocked_inode_to_wb-transaction.patch
 0006-writeback-use-unlocked_inode_to_wb-transaction-in-in.patch
 0007-writeback-add-lockdep-annotation-to-inode_to_wb.patch
 0008-writeback-implement-foreign-cgroup-inode-bdi_writeba.patch
 0009-writeback-disassociate-inodes-from-dying-bdi_writeba.patch

This patchset is on top of

  block/for-4.2/core b04a5636a665 ("block: replace trylock with mutex_lock in blkdev_reread_part()")
+ [1] [PATCHSET 1/3 v4 block/for-4.2/core] writeback: cgroup writeback support
+ [2] [PATCHSET 2/3 v3 block/for-4.2/core] writeback: cgroup writeback backpressure propagation

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-writeback-switch-20150522

diffstat follows.  Thanks.

 fs/buffer.c                      |   26 -
 fs/fs-writeback.c                |  523 ++++++++++++++++++++++++++++++++++++++-
 fs/mpage.c                       |    3 
 include/linux/backing-dev-defs.h |   66 ++++
 include/linux/backing-dev.h      |  144 +++++-----
 include/linux/fs.h               |   11 
 include/linux/mm.h               |    3 
 include/linux/writeback.h        |  123 +++++++++
 mm/backing-dev.c                 |   30 --
 mm/filemap.c                     |    5 
 mm/page-writeback.c              |   27 +-
 11 files changed, 822 insertions(+), 139 deletions(-)

--
tejun

[L] http://lkml.kernel.org/g/1428351508-8399-1-git-send-email-tj@kernel.org
[1] http://lkml.kernel.org/g/1432329245-5844-1-git-send-email-tj@kernel.org
[2] http://lkml.kernel.org/g/1428350674-8303-1-git-send-email-tj@kernel.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread
* [PATCHSET 3/3 v4 block/for-4.2/core] writeback: implement foreign cgroup inode bdi_writeback switching
@ 2015-05-28 18:50 Tejun Heo
  2015-05-28 18:50 ` [PATCH 8/9] " Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2015-05-28 18:50 UTC (permalink / raw)
  To: axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, jack-AlSwsSmVLrQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, hannes-druUgvl0LCNAfugRpC6u6w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA, lizefan-hv44wF8Li93QT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	mhocko-AlSwsSmVLrQ, clm-b10kYP2dOMg,
	fengguang.wu-ral2JQCrhuEAvxtiuMwx3w, david-FqsqvQoI3Ljby3iVrkZq2A,
	gthelen-hpIqsD4AKlfQT0dZR+AlfA, khlebnikov-XoJtRXgx1JseBXzfvpsJ4g

Hello,

The changes from the last take[L] are

* 0002-writeback-make-writeback_control-track-the-inode-bei.patch and
  0003-writeback-implement-foreign-cgroup-inode-detection.patch
  assumed that all wbc's are attached to the inode and wb being
  written out; however, pageout() path doesn't participate in cgroup
  writeback leading to oops.

  pageout() isn't the main writeback path, so the impact on isolation
  is relatively limited and as the whole path runs on the same thread
  we don't want it to block on slow cgroups anyway.  In the long term,
  the best route seems to make the path kick off the usual writeback
  path rather than trying to write pages directly.

  Both patches updated to skip cgroup writeback related processing if
  the wbc is not associated with inode / wb.

* might_lock() on tree_lock dropped from
  0007-writeback-add-lockdep-annotation-to-inode_to_wb.patch due to
  spurious locking context warnings.  Unfortunately, there isn't a
  simple way to express _irqsave for might_lock().

The previous two patchsets [1][2] implemented cgroup writeback support
and backpressure propagation through dirty throttling mechanism;
however, the inode is assigned to the wb (bdi_writeback) matching the
first dirtied page and stays there until released.  This first-use
policy can easily lead to gross misbehaviors - a single stray dirty
page can cause gigatbytes to be written by the wrong cgroup.  Also,
while concurrently write sharing an inode is extremely rare and
unsupported, inodes jumping cgroups over time are more common.

This patchset implements foreign cgroup inode detection and wb
switching.  Each writeback run tracks the majority wb being written
using a simple but fairly robust algorithm and when an inode
persistently writes out more foreign cgroup pages than local ones, the
inode is transferred to the majority winner.

This patchset adds 8 bytes to inode making the total per-inode space
overhead of cgroup writeback support 16 bytes on 64bit systems.  The
computational overhead should be negligible.  If the writer changes
from one cgroup to another entirely, the mechanism can render the
correct switch verdict in several seconds of IO time in most cases and
it can converge on the correct answer in reasonable amount of time
even in more ambiguous cases.

This patchset contains the following 8 patches.

 0001-writeback-relocate-wb-_try-_get-wb_put-inode_-attach.patch
 0002-writeback-make-writeback_control-track-the-inode-bei.patch
 0003-writeback-implement-foreign-cgroup-inode-detection.patch
 0004-writeback-implement-locked_-inode_to_wb_and_lock_lis.patch
 0005-writeback-implement-unlocked_inode_to_wb-transaction.patch
 0006-writeback-use-unlocked_inode_to_wb-transaction-in-in.patch
 0007-writeback-add-lockdep-annotation-to-inode_to_wb.patch
 0008-writeback-implement-foreign-cgroup-inode-bdi_writeba.patch
 0009-writeback-disassociate-inodes-from-dying-bdi_writeba.patch

This patchset is on top of

  block/for-4.2/core b04a5636a665 ("block: replace trylock with mutex_lock in blkdev_reread_part()")
+ [1] [PATCHSET 1/3 v4 block/for-4.2/core] writeback: cgroup writeback support
+ [2] [PATCHSET 2/3 v3 block/for-4.2/core] writeback: cgroup writeback backpressure propagation

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-writeback-switch-20150528

diffstat follows.  Thanks.

 fs/buffer.c                      |   26 -
 fs/fs-writeback.c                |  532 ++++++++++++++++++++++++++++++++++++++-
 fs/mpage.c                       |    3 
 include/linux/backing-dev-defs.h |   66 ++++
 include/linux/backing-dev.h      |  142 ++++------
 include/linux/fs.h               |   11 
 include/linux/mm.h               |    3 
 include/linux/writeback.h        |  130 +++++++++
 mm/backing-dev.c                 |   30 --
 mm/filemap.c                     |    5 
 mm/page-writeback.c              |   27 +
 11 files changed, 836 insertions(+), 139 deletions(-)

--
tejun

[L] http://lkml.kernel.org/g/1432334183-6324-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
[1] http://lkml.kernel.org/g/1432329245-5844-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
[2] http://lkml.kernel.org/g/1428350674-8303-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-05-28 18:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-22 22:36 [PATCHSET 3/3 v3 block/for-4.2/core] writeback: implement foreign cgroup inode bdi_writeback switching Tejun Heo
2015-05-22 22:36 ` [PATCH 1/9] writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() Tejun Heo
2015-05-22 22:36 ` [PATCH 2/9] writeback: make writeback_control track the inode being written back Tejun Heo
2015-05-22 22:36 ` [PATCH 3/9] writeback: implement foreign cgroup inode detection Tejun Heo
2015-05-22 22:36 ` [PATCH 4/9] writeback: implement [locked_]inode_to_wb_and_lock_list() Tejun Heo
2015-05-22 22:36 ` [PATCH 5/9] writeback: implement unlocked_inode_to_wb transaction and use it for stat updates Tejun Heo
2015-05-22 22:36 ` [PATCH 6/9] writeback: use unlocked_inode_to_wb transaction in inode_congested() Tejun Heo
2015-05-22 22:36 ` [PATCH 7/9] writeback: add lockdep annotation to inode_to_wb() Tejun Heo
     [not found]   ` <1432334183-6324-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-05-28  0:03     ` [PATCH v5 " Tejun Heo
2015-05-22 22:36 ` [PATCH 8/9] writeback: implement foreign cgroup inode bdi_writeback switching Tejun Heo
2015-05-22 22:36 ` [PATCH 9/9] writeback: disassociate inodes from dying bdi_writebacks Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2015-05-28 18:50 [PATCHSET 3/3 v4 block/for-4.2/core] writeback: implement foreign cgroup inode bdi_writeback switching Tejun Heo
2015-05-28 18:50 ` [PATCH 8/9] " Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).