* [GIT PULL] Cgroup writeback support for 4.2
@ 2015-06-25 14:44 Jens Axboe
2015-06-26 9:49 ` Geert Uytterhoeven
0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2015-06-25 14:44 UTC (permalink / raw)
To: torvalds; +Cc: linux-kernel
Hi Linus,
This is the big pull request for adding cgroup writeback support. This
code has been in development for a long time, and it has been simmering
in for-next for a good chunk of this cycle too. This is one of those
problems that has been talked about for at least half a decade, finally
there's a solution and code to go with it.
Also see last weeks writeup on LWN:
http://lwn.net/Articles/648292/
This pull request is on top of for-4.2/core, sent out earlier.
Please pull!
git://git.kernel.dk/linux-block.git for-4.2/writeback
----------------------------------------------------------------
Greg Thelen (1):
memcg: add per cgroup dirty page accounting
Jens Axboe (1):
buffer: remove unusued 'ret' variable
Tejun Heo (83):
page_writeback: revive cancel_dirty_page() in a restricted form
blkcg: move block/blk-cgroup.h to include/linux/blk-cgroup.h
update !CONFIG_BLK_CGROUP dummies in include/linux/blk-cgroup.h
blkcg: always create the blkcg_gq for the root blkcg
memcg: add mem_cgroup_root_css
blkcg: add blkcg_root_css
cgroup, block: implement task_get_css() and use it in bio_associate_current()
blkcg: implement task_get_blkcg_css()
blkcg: implement bio_associate_blkcg()
memcg: implement mem_cgroup_css_from_page()
writeback: move backing_dev_info->state into bdi_writeback
writeback: move backing_dev_info->bdi_stat[] into bdi_writeback
writeback: move bandwidth related fields from backing_dev_info into bdi_writeback
writeback: s/bdi/wb/ in mm/page-writeback.c
writeback: move backing_dev_info->wb_lock and ->worklist into bdi_writeback
writeback: reorganize mm/backing-dev.c
writeback: separate out include/linux/backing-dev-defs.h
bdi: make inode_to_bdi() inline
writeback: add @gfp to wb_init()
bdi: separate out congested state into a separate struct
writeback: add {CONFIG|BDI_CAP|FS}_CGROUP_WRITEBACK
writeback: make backing_dev_info host cgroup-specific bdi_writebacks
writeback, blkcg: associate each blkcg_gq with the corresponding bdi_writeback_congested
writeback: attribute stats to the matching per-cgroup bdi_writeback
writeback: let balance_dirty_pages() work on the matching cgroup bdi_writeback
writeback: make congestion functions per bdi_writeback
writeback, blkcg: restructure blk_{set|clear}_queue_congested()
writeback, blkcg: propagate non-root blkcg congestion state
writeback: implement and use inode_congested()
writeback: implement WB_has_dirty_io wb_state flag
writeback: implement backing_dev_info->tot_write_bandwidth
writeback: make bdi_has_dirty_io() take multiple bdi_writeback's into account
writeback: don't issue wb_writeback_work if clean
writeback: make bdi->min/max_ratio handling cgroup writeback aware
writeback: implement bdi_for_each_wb()
writeback: remove bdi_start_writeback()
writeback: make laptop_mode_timer_fn() handle multiple bdi_writeback's
writeback: make writeback_in_progress() take bdi_writeback instead of backing_dev_info
writeback: make bdi_start_background_writeback() take bdi_writeback instead of backing_dev_info
writeback: make wakeup_flusher_threads() handle multiple bdi_writeback's
writeback: make wakeup_dirtytime_writeback() handle multiple bdi_writeback's
writeback: add wb_writeback_work->auto_free
writeback: implement bdi_wait_for_completion()
writeback: implement wb_wait_for_single_work()
writeback: restructure try_writeback_inodes_sb[_nr]()
writeback: make writeback initiation functions handle multiple bdi_writeback's
writeback: dirty inodes against their matching cgroup bdi_writeback's
buffer, writeback: make __block_write_full_page() honor cgroup writeback
mpage: make __mpage_writepage() honor cgroup writeback
ext2: enable cgroup writeback support
memcg: make mem_cgroup_read_{stat|event}() iterate possible cpus instead of online
writeback: clean up wb_dirty_limit()
writeback: reorganize [__]wb_update_bandwidth()
writeback: implement wb_domain
writeback: move global_dirty_limit into wb_domain
writeback: consolidate dirty throttle parameters into dirty_throttle_control
writeback: add dirty_throttle_control->wb_bg_thresh
writeback: make __wb_calc_thresh() take dirty_throttle_control
writeback: add dirty_throttle_control->pos_ratio
writeback: add dirty_throttle_control->wb_completions
writeback: add dirty_throttle_control->dom
writeback: make __wb_writeout_inc() and hard_dirty_limit() take wb_domaas a parameter
writeback: separate out domain_dirty_limits()
writeback: move over_bground_thresh() to mm/page-writeback.c
writeback: update wb_over_bg_thresh() to use wb_domain aware operations
writeback: implement memcg wb_domain
writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
writeback: implement memcg writeback domain based throttling
mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
writeback: make writeback_control track the inode being written back
writeback: implement foreign cgroup inode detection
writeback: implement [locked_]inode_to_wb_and_lock_list()
writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
writeback: use unlocked_inode_to_wb transaction in inode_congested()
writeback: add lockdep annotation to inode_to_wb()
writeback: implement foreign cgroup inode bdi_writeback switching
writeback: disassociate inodes from dying bdi_writebacks
bdi: fix wrong error return value in cgwb_create()
v9fs: fix error handling in v9fs_session_init()
writeback: do foreign inode detection iff cgroup writeback is enabled
vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
writeback, blkio: add documentation for cgroup writeback support
Documentation/cgroups/blkio-controller.txt | 83 +-
Documentation/cgroups/memory.txt | 1 +
block/bio.c | 35 +-
block/blk-cgroup.c | 124 +-
block/blk-core.c | 70 +-
block/blk-integrity.c | 1 +
block/blk-sysfs.c | 3 +-
block/blk-throttle.c | 2 +-
block/bounce.c | 1 +
block/cfq-iosched.c | 2 +-
block/elevator.c | 2 +-
block/genhd.c | 1 +
drivers/block/drbd/drbd_int.h | 1 +
drivers/block/drbd/drbd_main.c | 10 +-
drivers/block/pktcdvd.c | 1 +
drivers/char/raw.c | 1 +
drivers/md/bcache/request.c | 1 +
drivers/md/dm.c | 2 +-
drivers/md/dm.h | 1 +
drivers/md/md.h | 1 +
drivers/md/raid1.c | 4 +-
drivers/md/raid10.c | 2 +-
drivers/mtd/devices/block2mtd.c | 1 +
.../lustre/include/linux/lustre_patchless_compat.h | 4 +-
fs/9p/v9fs.c | 50 +-
fs/9p/vfs_super.c | 8 +-
fs/block_dev.c | 9 +-
fs/buffer.c | 64 +-
fs/ext2/super.c | 1 +
fs/ext4/extents.c | 1 +
fs/ext4/mballoc.c | 1 +
fs/ext4/super.c | 1 +
fs/f2fs/node.c | 4 +-
fs/f2fs/segment.h | 3 +-
fs/fat/file.c | 1 +
fs/fat/inode.c | 1 +
fs/fs-writeback.c | 1167 +++++++++++++++----
fs/fuse/file.c | 12 +-
fs/gfs2/super.c | 2 +-
fs/hfs/super.c | 1 +
fs/hfsplus/super.c | 1 +
fs/inode.c | 1 +
fs/mpage.c | 3 +
fs/nfs/filelayout/filelayout.c | 1 +
fs/nfs/internal.h | 2 +-
fs/nfs/write.c | 3 +-
fs/ocfs2/file.c | 1 +
fs/reiserfs/super.c | 1 +
fs/ufs/super.c | 1 +
fs/xfs/xfs_aops.c | 12 +-
fs/xfs/xfs_file.c | 1 +
include/linux/backing-dev-defs.h | 255 ++++
include/linux/backing-dev.h | 557 ++++++---
include/linux/bio.h | 3 +
{block => include/linux}/blk-cgroup.h | 32 +-
include/linux/blkdev.h | 21 +-
include/linux/cgroup.h | 25 +
include/linux/fs.h | 26 +-
include/linux/memcontrol.h | 29 +
include/linux/mm.h | 8 +-
include/linux/pagemap.h | 3 +-
include/linux/writeback.h | 221 +++-
include/trace/events/writeback.h | 15 +-
init/Kconfig | 5 +
mm/backing-dev.c | 649 ++++++++---
mm/fadvise.c | 2 +-
mm/filemap.c | 34 +-
mm/madvise.c | 1 +
mm/memcontrol.c | 223 +++-
mm/page-writeback.c | 1231 +++++++++++++-------
mm/readahead.c | 2 +-
mm/rmap.c | 2 +
mm/truncate.c | 18 +-
mm/vmscan.c | 79 +-
74 files changed, 3898 insertions(+), 1250 deletions(-)
create mode 100644 include/linux/backing-dev-defs.h
rename {block => include/linux}/blk-cgroup.h (96%)
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GIT PULL] Cgroup writeback support for 4.2
2015-06-25 14:44 [GIT PULL] Cgroup writeback support for 4.2 Jens Axboe
@ 2015-06-26 9:49 ` Geert Uytterhoeven
2015-06-26 13:43 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Geert Uytterhoeven @ 2015-06-26 9:49 UTC (permalink / raw)
To: Jens Axboe, Tejun Heo; +Cc: torvalds, linux-kernel@vger.kernel.org
On Thu, Jun 25, 2015 at 4:44 PM, Jens Axboe <axboe@fb.com> wrote:
> This is the big pull request for adding cgroup writeback support. This
> code has been in development for a long time, and it has been simmering
> in for-next for a good chunk of this cycle too. This is one of those
> problems that has been talked about for at least half a decade, finally
> there's a solution and code to go with it.
Spoiler for TLDR: These are all false positives.
If CONFIG_CGROUP_WRITEBACK=n:
mm/page-writeback.c: In function ‘balance_dirty_pages_ratelimited’:
mm/page-writeback.c:1574: warning: ‘writeback’ is used
uninitialized in this function
In this case, mem_cgroup_wb_stats() is a dummy function that doesn't
write to its output parameters, hence writeback will contain arbitrary data.
There's another call too mem_cgroup_wb_stats() in
wb_over_bg_thresh() where my gcc 4.1.2 didn't warn, where it probably
deduced that mdtc will always be NULL, and the branch thus never taken.
Regardless of CONFIG_CGROUP_WRITEBACK:
mm/page-writeback.c: In function ‘balance_dirty_pages_ratelimited’:
mm/page-writeback.c:1537: warning: ‘m_bg_thresh’ may be used
uninitialized in this function
mm/page-writeback.c:1537: warning: ‘m_thresh’ may be used
uninitialized in this function
mm/page-writeback.c:1537: warning: ‘m_dirty’ may be used
uninitialized in this function
But these are false positives too, due to the many tests on mdtc or !mdtc,
and the creative use of dummy *_INIT() macros and mdtc_valid() static inline
functions.
I suggest refactoring this code to make it less fragile, though.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GIT PULL] Cgroup writeback support for 4.2
2015-06-26 9:49 ` Geert Uytterhoeven
@ 2015-06-26 13:43 ` Tejun Heo
2015-06-26 13:57 ` Geert Uytterhoeven
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2015-06-26 13:43 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: Jens Axboe, torvalds, linux-kernel@vger.kernel.org
Hello, Geert.
On Fri, Jun 26, 2015 at 11:49:58AM +0200, Geert Uytterhoeven wrote:
> Spoiler for TLDR: These are all false positives.
>
> If CONFIG_CGROUP_WRITEBACK=n:
>
> mm/page-writeback.c: In function ‘balance_dirty_pages_ratelimited’:
> mm/page-writeback.c:1574: warning: ‘writeback’ is used
> uninitialized in this function
>
> In this case, mem_cgroup_wb_stats() is a dummy function that doesn't
> write to its output parameters, hence writeback will contain arbitrary data.
>
> There's another call too mem_cgroup_wb_stats() in
> wb_over_bg_thresh() where my gcc 4.1.2 didn't warn, where it probably
> deduced that mdtc will always be NULL, and the branch thus never taken.
Can you please tell me the version of gcc which triggered the above
warnings?
> Regardless of CONFIG_CGROUP_WRITEBACK:
>
> mm/page-writeback.c: In function ‘balance_dirty_pages_ratelimited’:
> mm/page-writeback.c:1537: warning: ‘m_bg_thresh’ may be used
> uninitialized in this function
> mm/page-writeback.c:1537: warning: ‘m_thresh’ may be used
> uninitialized in this function
> mm/page-writeback.c:1537: warning: ‘m_dirty’ may be used
> uninitialized in this function
>
> But these are false positives too, due to the many tests on mdtc or !mdtc,
> and the creative use of dummy *_INIT() macros and mdtc_valid() static inline
> functions.
> I suggest refactoring this code to make it less fragile, though.
It's written that way deliberately with the purpose of triggering
compile warnings if later code breaks something and code paths which
should be left out when !CGROUP_WRITEBACK aren't while not impeding
readability with ifdefs and awakwardly split dummy functions.
The hope was that most compilers in use today are smart enough to
notice the code paths which are being disabled (it's pretty darn
obvious) and it seemed that way given the dearth of build warning
reports from -next.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GIT PULL] Cgroup writeback support for 4.2
2015-06-26 13:43 ` Tejun Heo
@ 2015-06-26 13:57 ` Geert Uytterhoeven
2015-06-26 14:28 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Geert Uytterhoeven @ 2015-06-26 13:57 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jens Axboe, torvalds, linux-kernel@vger.kernel.org
Hi Tejun,
On Fri, Jun 26, 2015 at 3:43 PM, Tejun Heo <tj@kernel.org> wrote:
> On Fri, Jun 26, 2015 at 11:49:58AM +0200, Geert Uytterhoeven wrote:
>> Spoiler for TLDR: These are all false positives.
>>
>> If CONFIG_CGROUP_WRITEBACK=n:
>>
>> mm/page-writeback.c: In function ‘balance_dirty_pages_ratelimited’:
>> mm/page-writeback.c:1574: warning: ‘writeback’ is used
>> uninitialized in this function
>>
>> In this case, mem_cgroup_wb_stats() is a dummy function that doesn't
>> write to its output parameters, hence writeback will contain arbitrary data.
>>
>> There's another call too mem_cgroup_wb_stats() in
>> wb_over_bg_thresh() where my gcc 4.1.2 didn't warn, where it probably
>> deduced that mdtc will always be NULL, and the branch thus never taken.
>
> Can you please tell me the version of gcc which triggered the above
> warnings?
gcc 4.1.2
>> Regardless of CONFIG_CGROUP_WRITEBACK:
>>
>> mm/page-writeback.c: In function ‘balance_dirty_pages_ratelimited’:
>> mm/page-writeback.c:1537: warning: ‘m_bg_thresh’ may be used
>> uninitialized in this function
>> mm/page-writeback.c:1537: warning: ‘m_thresh’ may be used
>> uninitialized in this function
>> mm/page-writeback.c:1537: warning: ‘m_dirty’ may be used
>> uninitialized in this function
>>
>> But these are false positives too, due to the many tests on mdtc or !mdtc,
>> and the creative use of dummy *_INIT() macros and mdtc_valid() static inline
>> functions.
>> I suggest refactoring this code to make it less fragile, though.
>
> It's written that way deliberately with the purpose of triggering
> compile warnings if later code breaks something and code paths which
> should be left out when !CGROUP_WRITEBACK aren't while not impeding
> readability with ifdefs and awakwardly split dummy functions.
>
> The hope was that most compilers in use today are smart enough to
> notice the code paths which are being disabled (it's pretty darn
> obvious) and it seemed that way given the dearth of build warning
> reports from -next.
That's why I keep on using gcc 4.1.2: it still gives build warnings for
many "used uninitialized" cases that later gcc versions let pass silently.
Granted, some of these are false positives (that's why it was disabled in
later gcc versions), but some of these are valid and real bugs.
Anyway, as a casual reader, it took me a while to notice all four warnings
listed above are false positives...
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GIT PULL] Cgroup writeback support for 4.2
2015-06-26 13:57 ` Geert Uytterhoeven
@ 2015-06-26 14:28 ` Tejun Heo
2015-06-26 14:56 ` Geert Uytterhoeven
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2015-06-26 14:28 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: Jens Axboe, torvalds, linux-kernel@vger.kernel.org
Hello, Geert.
On Fri, Jun 26, 2015 at 03:57:18PM +0200, Geert Uytterhoeven wrote:
> > Can you please tell me the version of gcc which triggered the above
> > warnings?
>
> gcc 4.1.2
I see. I read wrong.
> That's why I keep on using gcc 4.1.2: it still gives build warnings for
> many "used uninitialized" cases that later gcc versions let pass silently.
>
> Granted, some of these are false positives (that's why it was disabled in
> later gcc versions), but some of these are valid and real bugs.
That's kinda surprising. My impression has been that later gcc
versions are doing a lot better job both at actually detecting
problematic ones and avoiding false positives. I'm surprised that
4.1.2 is still catching uninitialized usages later gcc's (and other
static analyzers) can't. Can you roughly say how often it detects
actual problems that later ones can't?
> Anyway, as a casual reader, it took me a while to notice all four warnings
> listed above are false positives...
4.1.2 is more than 8 years old at this point. I really don't want to
kludge the code w/ unnecessary initializations as that'll actually
harm our ability to detect actual problems. The only option would be
refactoring the code so that larger blocks of code are put into
#ifdefed functions but I'm not really sure whether keeping 4.1.2 happy
should be a guideline that we follow when organizing code.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GIT PULL] Cgroup writeback support for 4.2
2015-06-26 14:28 ` Tejun Heo
@ 2015-06-26 14:56 ` Geert Uytterhoeven
2015-06-26 15:23 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Geert Uytterhoeven @ 2015-06-26 14:56 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jens Axboe, torvalds, linux-kernel@vger.kernel.org
Hi Tejun,
On Fri, Jun 26, 2015 at 4:28 PM, Tejun Heo <tj@kernel.org> wrote:
>> That's why I keep on using gcc 4.1.2: it still gives build warnings for
>> many "used uninitialized" cases that later gcc versions let pass silently.
>>
>> Granted, some of these are false positives (that's why it was disabled in
>> later gcc versions), but some of these are valid and real bugs.
>
> That's kinda surprising. My impression has been that later gcc
> versions are doing a lot better job both at actually detecting
> problematic ones and avoiding false positives. I'm surprised that
> 4.1.2 is still catching uninitialized usages later gcc's (and other
> static analyzers) can't. Can you roughly say how often it detects
> actual problems that later ones can't?
A handful every merge window. That's why I keep on doing this :-)
Since the release of v4.1:
- https://lkml.org/lkml/2015/6/25/334
- https://lkml.org/lkml/2015/6/25/337
- https://lkml.org/lkml/2015/6/24/88
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GIT PULL] Cgroup writeback support for 4.2
2015-06-26 14:56 ` Geert Uytterhoeven
@ 2015-06-26 15:23 ` Tejun Heo
0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2015-06-26 15:23 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: Jens Axboe, torvalds, linux-kernel@vger.kernel.org
Hey,
On Fri, Jun 26, 2015 at 04:56:11PM +0200, Geert Uytterhoeven wrote:
> A handful every merge window. That's why I keep on doing this :-)
>
> Since the release of v4.1:
> - https://lkml.org/lkml/2015/6/25/334
> - https://lkml.org/lkml/2015/6/25/337
> - https://lkml.org/lkml/2015/6/24/88
Heh, I see. I really wish later gcc's did a better job here tho. :(
For now, I'd much prefer to keep the code as-is becaues those
warnings, when triggered correctly, serves as a good indicator that
cgroup writeback throttling is interfering with global one.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-26 15:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-25 14:44 [GIT PULL] Cgroup writeback support for 4.2 Jens Axboe
2015-06-26 9:49 ` Geert Uytterhoeven
2015-06-26 13:43 ` Tejun Heo
2015-06-26 13:57 ` Geert Uytterhoeven
2015-06-26 14:28 ` Tejun Heo
2015-06-26 14:56 ` Geert Uytterhoeven
2015-06-26 15:23 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).