linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
@ 2017-10-09 21:29 Tejun Heo
  2017-10-09 21:29 ` [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Tejun Heo @ 2017-10-09 21:29 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik; +Cc: kernel-team, linux-kernel, linux-btrfs

Hello,

btrfs has different ways to issue metadata IOs and may end up issuing
metadata or otherwise shared IOs from a non-root cgroup, which can
lead to priority inversion and ineffective IO isolation.

This patchset makes sure that btrfs issues all metadata and shared IOs
from the root cgroup by exempting btree_inodes from cgroup writeback
and explicitly associating shared IOs with the root cgroup.

This patchset containst he following three patches

 [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode
 [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css()
 [PATCH 3/3] btrfs: ensure that metadata and flush are issued from the

and is also available in the following git branch

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-btrfs-metadata

diffstat follows.  Thanks.

 fs/block_dev.c              |    3 +--
 fs/btrfs/check-integrity.c  |    2 +-
 fs/btrfs/disk-io.c          |    4 ++++
 fs/btrfs/ioctl.c            |    6 +++++-
 fs/btrfs/super.c            |    1 -
 fs/buffer.c                 |   12 ++++++++++++
 fs/ext2/inode.c             |    3 ++-
 fs/ext2/super.c             |    1 -
 fs/ext4/inode.c             |    5 ++++-
 fs/ext4/super.c             |    2 --
 fs/fs-writeback.c           |    1 +
 include/linux/backing-dev.h |    2 +-
 include/linux/buffer_head.h |   11 +++++++++++
 include/linux/fs.h          |    3 ++-
 include/linux/writeback.h   |    6 ++++--
 15 files changed, 48 insertions(+), 14 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB
  2017-10-09 21:29 [PATCHSET] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
@ 2017-10-09 21:29 ` Tejun Heo
  2017-10-10  8:56   ` Jan Kara
  2017-10-09 21:29 ` [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css() Tejun Heo
  2017-10-09 21:29 ` [PATCH 3/3] btrfs: ensure that metadata and flush are issued from the root cgroup Tejun Heo
  2 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2017-10-09 21:29 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo,
	Theodore Ts'o, Andreas Dilger, linux-ext4

Currently, filesystem can indiate cgroup writeback support per
superblock; however, depending on the filesystem, especially if inodes
are used to carry metadata, it can be useful to indicate cgroup
writeback support per inode.

This patch replaces the superblock flag SB_I_CGROUPWB with per-inode
S_CGROUPWB, so that cgroup writeback can be enabled selectively.

* block_dev sets the new flag in bdget() when initializing new inode.

* ext2/4 set the new flag in ext?_set_inode_flags() function.

* btrfs sets the new flag in btrfs_update_iflags() function.  Note
  that this automatically excludes btree_inode which doesn't use
  btrfs_update_iflags() during initialization.  This is an intended
  behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
---
 fs/block_dev.c              | 3 +--
 fs/btrfs/ioctl.c            | 4 +++-
 fs/btrfs/super.c            | 1 -
 fs/ext2/inode.c             | 3 ++-
 fs/ext2/super.c             | 1 -
 fs/ext4/inode.c             | 5 ++++-
 fs/ext4/super.c             | 2 --
 include/linux/backing-dev.h | 2 +-
 include/linux/fs.h          | 3 ++-
 9 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088f..ff9c282 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -800,8 +800,6 @@ static struct dentry *bd_mount(struct file_system_type *fs_type,
 {
 	struct dentry *dent;
 	dent = mount_pseudo(fs_type, "bdev:", &bdev_sops, NULL, BDEVFS_MAGIC);
-	if (!IS_ERR(dent))
-		dent->d_sb->s_iflags |= SB_I_CGROUPWB;
 	return dent;
 }
 
@@ -893,6 +891,7 @@ struct block_device *bdget(dev_t dev)
 		inode->i_mode = S_IFBLK;
 		inode->i_rdev = dev;
 		inode->i_bdev = bdev;
+		inode->i_flags |= S_CGROUPWB;
 		inode->i_data.a_ops = &def_blk_aops;
 		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
 		spin_lock(&bdev_lock);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6c7a49f..117cc63 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,9 +150,11 @@ void btrfs_update_iflags(struct inode *inode)
 		new_fl |= S_NOATIME;
 	if (ip->flags & BTRFS_INODE_DIRSYNC)
 		new_fl |= S_DIRSYNC;
+	new_fl |= S_CGROUPWB;
 
 	set_mask_bits(&inode->i_flags,
-		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC,
+		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
+		      S_CGROUPWB,
 		      new_fl);
 }
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 35a128a..46a1488 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1136,7 +1136,6 @@ static int btrfs_fill_super(struct super_block *sb,
 	sb->s_flags |= MS_POSIXACL;
 #endif
 	sb->s_flags |= MS_I_VERSION;
-	sb->s_iflags |= SB_I_CGROUPWB;
 
 	err = super_setup_bdi(sb);
 	if (err) {
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 4dca6f3..3c5d822 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1372,7 +1372,7 @@ void ext2_set_inode_flags(struct inode *inode)
 	unsigned int flags = EXT2_I(inode)->i_flags;
 
 	inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME |
-				S_DIRSYNC | S_DAX);
+				S_DIRSYNC | S_DAX | S_CGROUPWB);
 	if (flags & EXT2_SYNC_FL)
 		inode->i_flags |= S_SYNC;
 	if (flags & EXT2_APPEND_FL)
@@ -1385,6 +1385,7 @@ void ext2_set_inode_flags(struct inode *inode)
 		inode->i_flags |= S_DIRSYNC;
 	if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode))
 		inode->i_flags |= S_DAX;
+	inode->i_flags |= S_CGROUPWB;
 }
 
 struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1458706..e6ba669e 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -922,7 +922,6 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
 		((EXT2_SB(sb)->s_mount_opt & EXT2_MOUNT_POSIX_ACL) ?
 		 MS_POSIXACL : 0);
-	sb->s_iflags |= SB_I_CGROUPWB;
 
 	if (le32_to_cpu(es->s_rev_level) == EXT2_GOOD_OLD_REV &&
 	    (EXT2_HAS_COMPAT_FEATURE(sb, ~0U) ||
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 31db875..344f12b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4591,8 +4591,11 @@ void ext4_set_inode_flags(struct inode *inode)
 	    !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
 	    !ext4_encrypted_inode(inode))
 		new_fl |= S_DAX;
+	if (test_opt(inode->i_sb, DATA_FLAGS) != EXT4_MOUNT_JOURNAL_DATA)
+		new_fl |= S_CGROUPWB;
 	inode_set_flags(inode, new_fl,
-			S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
+			S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX|
+			S_CGROUPWB);
 }
 
 static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b104096..44ddf1d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3620,8 +3620,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		}
 		if (test_opt(sb, DELALLOC))
 			clear_opt(sb, DELALLOC);
-	} else {
-		sb->s_iflags |= SB_I_CGROUPWB;
 	}
 
 	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 854e1bd..7274de0 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -246,7 +246,7 @@ static inline bool inode_cgwb_enabled(struct inode *inode)
 		cgroup_subsys_on_dfl(io_cgrp_subsys) &&
 		bdi_cap_account_dirty(bdi) &&
 		(bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
-		(inode->i_sb->s_iflags & SB_I_CGROUPWB);
+		IS_CGROUPWB(inode);
 }
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 13dab19..be2d8a6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1308,7 +1308,6 @@ extern int send_sigurg(struct fown_struct *fown);
 #define UMOUNT_UNUSED	0x80000000	/* Flag guaranteed to be unused */
 
 /* sb->s_iflags */
-#define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
 #define SB_I_NODEV	0x00000004	/* Ignore devices on this fs */
 
@@ -1853,6 +1852,7 @@ struct super_operations {
 #else
 #define S_DAX		0	/* Make all the DAX code disappear */
 #endif
+#define S_CGROUPWB	16384	/* Enable cgroup writeback for this inode */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1892,6 +1892,7 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
 #define IS_AUTOMOUNT(inode)	((inode)->i_flags & S_AUTOMOUNT)
 #define IS_NOSEC(inode)		((inode)->i_flags & S_NOSEC)
 #define IS_DAX(inode)		((inode)->i_flags & S_DAX)
+#define IS_CGROUPWB(inode)	((inode)->i_flags & S_CGROUPWB)
 
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css()
  2017-10-09 21:29 [PATCHSET] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
  2017-10-09 21:29 ` [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
@ 2017-10-09 21:29 ` Tejun Heo
  2017-10-10  9:31   ` Jan Kara
  2017-10-09 21:29 ` [PATCH 3/3] btrfs: ensure that metadata and flush are issued from the root cgroup Tejun Heo
  2 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2017-10-09 21:29 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo

Add wbc->blkcg_css so that the blkcg_css association can be specified
independently and implement submit_bh_blkcg_css() using it.  This will
be used to override cgroup membership on specific buffer_heads.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/buffer.c                 | 12 ++++++++++++
 fs/fs-writeback.c           |  1 +
 include/linux/buffer_head.h | 11 +++++++++++
 include/linux/writeback.h   |  6 ++++--
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 170df85..fac4f9a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3143,6 +3143,18 @@ int submit_bh(int op, int op_flags, struct buffer_head *bh)
 }
 EXPORT_SYMBOL(submit_bh);
 
+#ifdef CONFIG_CGROUP_WRITEBACK
+int submit_bh_blkcg_css(int op, int op_flags,  struct buffer_head *bh,
+			struct cgroup_subsys_state *blkcg_css)
+{
+	struct writeback_control wbc = { .blkcg_css = blkcg_css };
+
+	/* @wbc is used just to override the bio's blkcg_css */
+	return submit_bh_wbc(op, op_flags, bh, 0, &wbc);
+}
+EXPORT_SYMBOL(submit_bh_blkcg_css);
+#endif
+
 /**
  * ll_rw_block: low-level access to block devices (DEPRECATED)
  * @op: whether to %READ or %WRITE
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 245c430..cd882e6 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -538,6 +538,7 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
 	}
 
 	wbc->wb = inode_to_wb(inode);
+	wbc->blkcg_css = wbc->wb->blkcg_css;
 	wbc->inode = inode;
 
 	wbc->wb_id = wbc->wb->memcg_css->id;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c8dae55..abb4dd4 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -13,6 +13,7 @@
 #include <linux/pagemap.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
+#include <linux/cgroup.h>
 
 #ifdef CONFIG_BLOCK
 
@@ -205,6 +206,16 @@ int bh_submit_read(struct buffer_head *bh);
 loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset,
 				 loff_t length, int whence);
 
+#ifdef CONFIG_CGROUP_WRITEBACK
+int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh,
+		    struct cgroup_subsys_state *blkcg_css);
+#else
+static inline int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh,
+				  struct cgroup_subsys_state *blkcg_css)
+{
+	return submit_bh(op, op_flags, bh);
+}
+#endif
 extern int buffer_heads_over_limit;
 
 /*
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index d581579..81e5946 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -91,6 +91,8 @@ struct writeback_control {
 	unsigned for_sync:1;		/* sync(2) WB_SYNC_ALL writeback */
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct bdi_writeback *wb;	/* wb this writeback is issued under */
+	struct cgroup_subsys_state *blkcg_css; /* usually wb->blkcg_css but
+						  may be overridden */
 	struct inode *inode;		/* inode being written out */
 
 	/* foreign inode detection, see wbc_detach_inode() */
@@ -277,8 +279,8 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
 	 * behind a slow cgroup.  Ultimately, we want pageout() to kick off
 	 * regular writeback instead of writing things out itself.
 	 */
-	if (wbc->wb)
-		bio_associate_blkcg(bio, wbc->wb->blkcg_css);
+	if (wbc->blkcg_css)
+		bio_associate_blkcg(bio, wbc->blkcg_css);
 }
 
 #else	/* CONFIG_CGROUP_WRITEBACK */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-09 21:29 [PATCHSET] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
  2017-10-09 21:29 ` [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
  2017-10-09 21:29 ` [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css() Tejun Heo
@ 2017-10-09 21:29 ` Tejun Heo
  2 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2017-10-09 21:29 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo

Issuing metdata or otherwise shared IOs from !root cgroup can lead to
priority inversion.  This patch ensures that those IOs are always
issued from the root cgroup.

This patch updates btrfs_update_iflags() to not set S_CGROUPWB on
btree_inodes.  This isn't strictly necessary as those inodes don't
call the function during init; however, this serves as documentation
and prevents possible future mistakes.  If this isn't desirable,
please feel free to drop the section.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/check-integrity.c | 2 +-
 fs/btrfs/disk-io.c         | 4 ++++
 fs/btrfs/ioctl.c           | 4 +++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7d5a9b5..058dea6 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
 	struct btrfsic_dev_state *dev_state;
 
 	if (!btrfsic_is_initialized)
-		return submit_bh(op, op_flags, bh);
+		return submit_bh_blkcg_css(op, op_flags, blkcg_root_css);
 
 	mutex_lock(&btrfsic_mutex);
 	/* since btrfsic_submit_bh() might also be called before
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab84..fe8bbe1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hook(void *private_data, struct bio *bio,
 	int async = check_async_write(bio_flags);
 	blk_status_t ret;
 
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	if (bio_op(bio) != REQ_OP_WRITE) {
 		/*
 		 * called for a read, do the setup so that checksum validation
@@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs_device *device)
 		return;
 
 	bio_reset(bio);
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	bio->bi_end_io = btrfs_end_empty_barrier;
 	bio_set_dev(bio, device->bdev);
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 117cc63..8a7db6c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,7 +150,9 @@ void btrfs_update_iflags(struct inode *inode)
 		new_fl |= S_NOATIME;
 	if (ip->flags & BTRFS_INODE_DIRSYNC)
 		new_fl |= S_DIRSYNC;
-	new_fl |= S_CGROUPWB;
+	/* btree_inodes are always in the root cgroup */
+	if (btrfs_ino(ip) != BTRFS_BTREE_INODE_OBJECTID)
+		new_fl |= S_CGROUPWB;
 
 	set_mask_bits(&inode->i_flags,
 		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB
  2017-10-09 21:29 ` [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
@ 2017-10-10  8:56   ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2017-10-10  8:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs,
	Theodore Ts'o, Andreas Dilger, linux-ext4

On Mon 09-10-17 14:29:09, Tejun Heo wrote:
> Currently, filesystem can indiate cgroup writeback support per
> superblock; however, depending on the filesystem, especially if inodes
> are used to carry metadata, it can be useful to indicate cgroup
> writeback support per inode.
> 
> This patch replaces the superblock flag SB_I_CGROUPWB with per-inode
> S_CGROUPWB, so that cgroup writeback can be enabled selectively.
> 
> * block_dev sets the new flag in bdget() when initializing new inode.
> 
> * ext2/4 set the new flag in ext?_set_inode_flags() function.
> 
> * btrfs sets the new flag in btrfs_update_iflags() function.  Note
>   that this automatically excludes btree_inode which doesn't use
>   btrfs_update_iflags() during initialization.  This is an intended
>   behavior change.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Chris Mason <clm@fb.com>
> Cc: Josef Bacik <jbacik@fb.com>
> Cc: linux-btrfs@vger.kernel.org
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Cc: linux-ext4@vger.kernel.org

...

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 31db875..344f12b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4591,8 +4591,11 @@ void ext4_set_inode_flags(struct inode *inode)
>  	    !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
>  	    !ext4_encrypted_inode(inode))
>  		new_fl |= S_DAX;
> +	if (test_opt(inode->i_sb, DATA_FLAGS) != EXT4_MOUNT_JOURNAL_DATA)
> +		new_fl |= S_CGROUPWB;

Use ext4_should_journal_data(inode) here? Ext4 can be journalling data also
because of per-inode flag.

Otherwise the patch looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css()
  2017-10-09 21:29 ` [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css() Tejun Heo
@ 2017-10-10  9:31   ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2017-10-10  9:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

On Mon 09-10-17 14:29:10, Tejun Heo wrote:
> Add wbc->blkcg_css so that the blkcg_css association can be specified
> independently and implement submit_bh_blkcg_css() using it.  This will
> be used to override cgroup membership on specific buffer_heads.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jens Axboe <axboe@kernel.dk>

Hum, I dislike growing wbc just to pass blkcg_css through one function in
fs/buffer.c (in otherwise empty wbc). Not that space would be a big concern
for wbc but it just gets messier... Cannot we just refactor the code a
bit like:

1) Create function

struct bio *create_bio_bh(int op, int op_flags, struct buffer_head *bh,
			  enum rw_hint write_hint);

which would create bio from bh.

2) Make submit_bh(), submit_bh_wbc(), submit_bh_blkcg_css() use this and
the latter two would further tweak the bio as needed before submitting.

Thoughts?

								Honza


> ---
>  fs/buffer.c                 | 12 ++++++++++++
>  fs/fs-writeback.c           |  1 +
>  include/linux/buffer_head.h | 11 +++++++++++
>  include/linux/writeback.h   |  6 ++++--
>  4 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 170df85..fac4f9a 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3143,6 +3143,18 @@ int submit_bh(int op, int op_flags, struct buffer_head *bh)
>  }
>  EXPORT_SYMBOL(submit_bh);
>  
> +#ifdef CONFIG_CGROUP_WRITEBACK
> +int submit_bh_blkcg_css(int op, int op_flags,  struct buffer_head *bh,
> +			struct cgroup_subsys_state *blkcg_css)
> +{
> +	struct writeback_control wbc = { .blkcg_css = blkcg_css };
> +
> +	/* @wbc is used just to override the bio's blkcg_css */
> +	return submit_bh_wbc(op, op_flags, bh, 0, &wbc);
> +}
> +EXPORT_SYMBOL(submit_bh_blkcg_css);
> +#endif
> +
>  /**
>   * ll_rw_block: low-level access to block devices (DEPRECATED)
>   * @op: whether to %READ or %WRITE
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 245c430..cd882e6 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -538,6 +538,7 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
>  	}
>  
>  	wbc->wb = inode_to_wb(inode);
> +	wbc->blkcg_css = wbc->wb->blkcg_css;
>  	wbc->inode = inode;
>  
>  	wbc->wb_id = wbc->wb->memcg_css->id;
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index c8dae55..abb4dd4 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -13,6 +13,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/wait.h>
>  #include <linux/atomic.h>
> +#include <linux/cgroup.h>
>  
>  #ifdef CONFIG_BLOCK
>  
> @@ -205,6 +206,16 @@ int bh_submit_read(struct buffer_head *bh);
>  loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset,
>  				 loff_t length, int whence);
>  
> +#ifdef CONFIG_CGROUP_WRITEBACK
> +int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh,
> +		    struct cgroup_subsys_state *blkcg_css);
> +#else
> +static inline int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh,
> +				  struct cgroup_subsys_state *blkcg_css)
> +{
> +	return submit_bh(op, op_flags, bh);
> +}
> +#endif
>  extern int buffer_heads_over_limit;
>  
>  /*
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index d581579..81e5946 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -91,6 +91,8 @@ struct writeback_control {
>  	unsigned for_sync:1;		/* sync(2) WB_SYNC_ALL writeback */
>  #ifdef CONFIG_CGROUP_WRITEBACK
>  	struct bdi_writeback *wb;	/* wb this writeback is issued under */
> +	struct cgroup_subsys_state *blkcg_css; /* usually wb->blkcg_css but
> +						  may be overridden */
>  	struct inode *inode;		/* inode being written out */
>  
>  	/* foreign inode detection, see wbc_detach_inode() */
> @@ -277,8 +279,8 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
>  	 * behind a slow cgroup.  Ultimately, we want pageout() to kick off
>  	 * regular writeback instead of writing things out itself.
>  	 */
> -	if (wbc->wb)
> -		bio_associate_blkcg(bio, wbc->wb->blkcg_css);
> +	if (wbc->blkcg_css)
> +		bio_associate_blkcg(bio, wbc->blkcg_css);
>  }
>  
>  #else	/* CONFIG_CGROUP_WRITEBACK */
> -- 
> 2.9.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-10-10  9:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-09 21:29 [PATCHSET] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
2017-10-09 21:29 ` [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
2017-10-10  8:56   ` Jan Kara
2017-10-09 21:29 ` [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css() Tejun Heo
2017-10-10  9:31   ` Jan Kara
2017-10-09 21:29 ` [PATCH 3/3] btrfs: ensure that metadata and flush are issued from the root cgroup Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).