[PATCH v6 0/6] Btrfs: implement swap file support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 0/6] Btrfs: implement swap file support
@ 2018-09-07  7:39 Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS Omar Sandoval
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, linux-mm

From: Omar Sandoval <osandov@fb.com>

Hi,

This series implements swap file support for Btrfs.

Compared to v5 [1], this is pretty much feature-complete. It now
supports:

- Balance (skips block groups containing an active swap file)
- Resize (error if trying to shrink past a block group containing an
  active swap file, allowed otherwise)
- Device delete/replace (as long as the device in question does not
  contain an active swap file)

This implementation Chris and I came up with is much cleaner than my
earlier ideas: instead of adding any counters to struct
btrfs_block_group_cache or struct btrfs_device, we just have a small
red-black tree of block groups and devices which contain an active
swapfile.

I updated the xfstests for this series [2] to test this new
functionality, and put it through the same tests as v5.

Based on v4.19-rc2, please take a look.

Thanks!

1: https://www.spinics.net/lists/linux-btrfs/msg81550.html
2: https://github.com/osandov/xfstests/tree/btrfs-swap

Omar Sandoval (6):
  mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
  mm: export add_swap_extent()
  vfs: update swap_{,de}activate documentation
  Btrfs: prevent ioctls from interfering with a swap file
  Btrfs: rename get_chunk_map() and make it non-static
  Btrfs: support swap files

 Documentation/filesystems/Locking |  17 +-
 Documentation/filesystems/vfs.txt |  12 +-
 fs/btrfs/ctree.h                  |  24 +++
 fs/btrfs/dev-replace.c            |   8 +
 fs/btrfs/disk-io.c                |   4 +
 fs/btrfs/inode.c                  | 316 ++++++++++++++++++++++++++++++
 fs/btrfs/ioctl.c                  |  31 ++-
 fs/btrfs/relocation.c             |  18 +-
 fs/btrfs/volumes.c                |  71 +++++--
 fs/btrfs/volumes.h                |   9 +
 include/linux/swap.h              |  13 +-
 mm/page_io.c                      |   6 +-
 mm/swapfile.c                     |  14 +-
 13 files changed, 492 insertions(+), 51 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
  2018-09-07  7:39 [PATCH v6 0/6] Btrfs: implement swap file support Omar Sandoval
@ 2018-09-07  7:39 ` Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 2/6] mm: export add_swap_extent() Omar Sandoval
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, linux-mm

From: Omar Sandoval <osandov@fb.com>

The SWP_FILE flag serves two purposes: to make swap_{read,write}page()
go through the filesystem, and to make swapoff() call
->swap_deactivate(). For Btrfs, we want the latter but not the former,
so split this flag into two. This makes us always call
->swap_deactivate() if ->swap_activate() succeeded, not just if it
didn't add any swap extents itself.

This also resolves the issue of the very misleading name of SWP_FILE,
which is only used for swap files over NFS.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 include/linux/swap.h | 13 +++++++------
 mm/page_io.c         |  6 +++---
 mm/swapfile.c        | 13 ++++++++-----
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8e2c11e692ba..0fda0aa743f0 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -167,13 +167,14 @@ enum {
 	SWP_SOLIDSTATE	= (1 << 4),	/* blkdev seeks are cheap */
 	SWP_CONTINUED	= (1 << 5),	/* swap_map has count continuation */
 	SWP_BLKDEV	= (1 << 6),	/* its a block device */
-	SWP_FILE	= (1 << 7),	/* set after swap_activate success */
-	SWP_AREA_DISCARD = (1 << 8),	/* single-time swap area discards */
-	SWP_PAGE_DISCARD = (1 << 9),	/* freed swap page-cluster discards */
-	SWP_STABLE_WRITES = (1 << 10),	/* no overwrite PG_writeback pages */
-	SWP_SYNCHRONOUS_IO = (1 << 11),	/* synchronous IO is efficient */
+	SWP_ACTIVATED	= (1 << 7),	/* set after swap_activate success */
+	SWP_FS		= (1 << 8),	/* swap file goes through fs */
+	SWP_AREA_DISCARD = (1 << 9),	/* single-time swap area discards */
+	SWP_PAGE_DISCARD = (1 << 10),	/* freed swap page-cluster discards */
+	SWP_STABLE_WRITES = (1 << 11),	/* no overwrite PG_writeback pages */
+	SWP_SYNCHRONOUS_IO = (1 << 12),	/* synchronous IO is efficient */
 					/* add others here before... */
-	SWP_SCANNING	= (1 << 12),	/* refcount in scan_swap_map */
+	SWP_SCANNING	= (1 << 13),	/* refcount in scan_swap_map */
 };
 
 #define SWAP_CLUSTER_MAX 32UL
diff --git a/mm/page_io.c b/mm/page_io.c
index aafd19ec1db4..e8653c368069 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -283,7 +283,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	struct swap_info_struct *sis = page_swap_info(page);
 
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
-	if (sis->flags & SWP_FILE) {
+	if (sis->flags & SWP_FS) {
 		struct kiocb kiocb;
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
@@ -365,7 +365,7 @@ int swap_readpage(struct page *page, bool synchronous)
 		goto out;
 	}
 
-	if (sis->flags & SWP_FILE) {
+	if (sis->flags & SWP_FS) {
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
 
@@ -423,7 +423,7 @@ int swap_set_page_dirty(struct page *page)
 {
 	struct swap_info_struct *sis = page_swap_info(page);
 
-	if (sis->flags & SWP_FILE) {
+	if (sis->flags & SWP_FS) {
 		struct address_space *mapping = sis->swap_file->f_mapping;
 
 		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d954b71c4f9c..d3f95833d12e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -989,7 +989,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
 			goto nextsi;
 		}
 		if (size == SWAPFILE_CLUSTER) {
-			if (!(si->flags & SWP_FILE))
+			if (!(si->flags & SWP_FS))
 				n_ret = swap_alloc_cluster(si, swp_entries);
 		} else
 			n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE,
@@ -2310,12 +2310,13 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
 		kfree(se);
 	}
 
-	if (sis->flags & SWP_FILE) {
+	if (sis->flags & SWP_ACTIVATED) {
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
 
-		sis->flags &= ~SWP_FILE;
-		mapping->a_ops->swap_deactivate(swap_file);
+		sis->flags &= ~SWP_ACTIVATED;
+		if (mapping->a_ops->swap_deactivate)
+			mapping->a_ops->swap_deactivate(swap_file);
 	}
 }
 
@@ -2411,8 +2412,10 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 
 	if (mapping->a_ops->swap_activate) {
 		ret = mapping->a_ops->swap_activate(sis, swap_file, span);
+		if (ret >= 0)
+			sis->flags |= SWP_ACTIVATED;
 		if (!ret) {
-			sis->flags |= SWP_FILE;
+			sis->flags |= SWP_FS;
 			ret = add_swap_extent(sis, 0, sis->max, 0);
 			*span = sis->pages;
 		}
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/6] mm: export add_swap_extent()
  2018-09-07  7:39 [PATCH v6 0/6] Btrfs: implement swap file support Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS Omar Sandoval
@ 2018-09-07  7:39 ` Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 3/6] vfs: update swap_{,de}activate documentation Omar Sandoval
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, linux-mm

From: Omar Sandoval <osandov@fb.com>

Btrfs will need this for swap file support.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 mm/swapfile.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index d3f95833d12e..51cb30de17bc 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2365,6 +2365,7 @@ add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 	list_add_tail(&new_se->list, &sis->first_swap_extent.list);
 	return 1;
 }
+EXPORT_SYMBOL_GPL(add_swap_extent);
 
 /*
  * A `swap extent' is a simple thing which maps a contiguous range of pages
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 3/6] vfs: update swap_{,de}activate documentation
  2018-09-07  7:39 [PATCH v6 0/6] Btrfs: implement swap file support Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 2/6] mm: export add_swap_extent() Omar Sandoval
@ 2018-09-07  7:39 ` Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 4/6] Btrfs: prevent ioctls from interfering with a swap file Omar Sandoval
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, linux-mm

From: Omar Sandoval <osandov@fb.com>

The documentation for these functions is wrong in several ways:

- swap_activate() is called with the inode locked
- swap_activate() takes a swap_info_struct * and a sector_t *
- swap_activate() can also return a positive number of extents it added
  itself
- swap_deactivate() does not return anything

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 Documentation/filesystems/Locking | 17 +++++++----------
 Documentation/filesystems/vfs.txt | 12 ++++++++----
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index efea228ccd8a..b970c8c2ee22 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -210,8 +210,9 @@ prototypes:
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
-	int (*swap_activate)(struct file *);
-	int (*swap_deactivate)(struct file *);
+	int (*swap_activate)(struct swap_info_struct *, struct file *,
+			     sector_t *);
+	void (*swap_deactivate)(struct file *);
 
 locking rules:
 	All except set_page_dirty and freepage may block
@@ -235,8 +236,8 @@ putback_page:		yes
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
-swap_activate:		no
-swap_deactivate:	no
+swap_activate:					yes
+swap_deactivate:				no
 
 	->write_begin(), ->write_end() and ->readpage() may be called from
 the request handler (/dev/loop).
@@ -333,14 +334,10 @@ cleaned, or an error value if not. Note that in order to prevent the page
 getting mapped back in and redirtied, it needs to be kept locked
 across the entire operation.
 
-	->swap_activate will be called with a non-zero argument on
-files backing (non block device backed) swapfiles. A return value
-of zero indicates success, in which case this file can be used for
-backing swapspace. The swapspace operations will be proxied to the
-address space operations.
+	->swap_activate is called from sys_swapon() with the inode locked.
 
 	->swap_deactivate() will be called in the sys_swapoff()
-path after ->swap_activate() returned success.
+path after ->swap_activate() returned success. The inode is not locked.
 
 ----------------------- file_lock_operations ------------------------------
 prototypes:
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 4b2084d0f1fb..40d6d6d4b76b 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -652,8 +652,9 @@ struct address_space_operations {
 					unsigned long);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
 	int (*error_remove_page) (struct mapping *mapping, struct page *page);
-	int (*swap_activate)(struct file *);
-	int (*swap_deactivate)(struct file *);
+	int (*swap_activate)(struct swap_info_struct *, struct file *,
+			     sector_t *);
+	void (*swap_deactivate)(struct file *);
 };
 
   writepage: called by the VM to write a dirty page to backing store.
@@ -830,8 +831,11 @@ struct address_space_operations {
 
   swap_activate: Called when swapon is used on a file to allocate
 	space if necessary and pin the block lookup information in
-	memory. A return value of zero indicates success,
-	in which case this file can be used to back swapspace.
+	memory. If this returns zero, the swap system will call the address
+	space operations ->readpage() and ->direct_IO(). Alternatively, this
+	may call add_swap_extent() and return the number of extents added, in
+	which case the swap system will use the provided blocks directly
+	instead of going through the filesystem.
 
   swap_deactivate: Called during swapoff on files where swap_activate
 	was successful.
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 4/6] Btrfs: prevent ioctls from interfering with a swap file
  2018-09-07  7:39 [PATCH v6 0/6] Btrfs: implement swap file support Omar Sandoval
                   ` (2 preceding siblings ...)
  2018-09-07  7:39 ` [PATCH v6 3/6] vfs: update swap_{,de}activate documentation Omar Sandoval
@ 2018-09-07  7:39 ` Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 5/6] Btrfs: rename get_chunk_map() and make it non-static Omar Sandoval
  2018-09-07  7:39 ` [PATCH v6 6/6] Btrfs: support swap files Omar Sandoval
  5 siblings, 0 replies; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, linux-mm

From: Omar Sandoval <osandov@fb.com>

When a swap file is active, we must make sure that the extents of the
file are not moved and that they don't become shared. That means that
the following are not safe:

- chattr +c (enable compression)
- reflink
- dedupe
- snapshot
- defrag

Don't allow those to happen on an active swap file.

Additionally, balance, resize, device remove, and device replace are
also unsafe if they affect an active swapfile. Add a red-black tree of
block groups and devices which contain an active swapfile. Relocation
checks each block group against this tree and skips it or errors out for
balance or resize, respectively. Device remove and device replace check
the tree for the device they will operate on.

Note that we don't have to worry about chattr -C (disable nocow), which
we ignore for non-empty files, because an active swapfile must be
non-empty and can't be truncated. We also don't have to worry about
autodefrag because it's only done on COW files. Truncate and fallocate
are already taken care of by the generic code. Device add doesn't do
relocation so it's not an issue, either.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/ctree.h       | 24 +++++++++++++++++++++
 fs/btrfs/dev-replace.c |  8 +++++++
 fs/btrfs/disk-io.c     |  4 ++++
 fs/btrfs/ioctl.c       | 31 +++++++++++++++++++++++---
 fs/btrfs/relocation.c  | 18 ++++++++++++----
 fs/btrfs/volumes.c     | 49 +++++++++++++++++++++++++++++++++++++-----
 6 files changed, 122 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 53af9f5253f4..e37ce40db380 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -716,6 +716,23 @@ struct btrfs_fs_devices;
 struct btrfs_balance_control;
 struct btrfs_delayed_root;
 
+/*
+ * Block group or device which contains an active swapfile. Used for preventing
+ * unsafe operations while a swapfile is active.
+ */
+struct btrfs_swapfile_pin {
+	struct rb_node node;
+	void *ptr;
+	struct inode *inode;
+	/*
+	 * If true, ptr points to a struct btrfs_block_group_cache. Otherwise,
+	 * ptr points to a struct btrfs_device.
+	 */
+	bool is_block_group;
+};
+
+bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
+
 #define BTRFS_FS_BARRIER			1
 #define BTRFS_FS_CLOSING_START			2
 #define BTRFS_FS_CLOSING_DONE			3
@@ -1121,6 +1138,10 @@ struct btrfs_fs_info {
 	u32 sectorsize;
 	u32 stripesize;
 
+	/* Block groups and devices containing active swapfiles. */
+	spinlock_t swapfile_pins_lock;
+	struct rb_root swapfile_pins;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
@@ -1285,6 +1306,9 @@ struct btrfs_root {
 	spinlock_t qgroup_meta_rsv_lock;
 	u64 qgroup_meta_rsv_pertrans;
 	u64 qgroup_meta_rsv_prealloc;
+
+	/* Number of active swapfiles */
+	atomic_t nr_swapfiles;
 };
 
 struct btrfs_file_private {
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index dec01970d8c5..09d2cee2635b 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -414,6 +414,14 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	if (ret)
 		return ret;
 
+	if (btrfs_pinned_by_swapfile(fs_info, src_device)) {
+		btrfs_info_in_rcu(fs_info,
+				  "cannot replace device %s (devid %llu) due to active swapfile",
+				  btrfs_dev_name(src_device),
+				  src_device->devid);
+		return -ETXTBSY;
+	}
+
 	ret = btrfs_init_dev_replace_tgtdev(fs_info, tgtdev_name,
 					    src_device, &tgt_device);
 	if (ret)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5124c15705ce..0ab3527bedd1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1187,6 +1187,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	atomic_set(&root->log_batch, 0);
 	refcount_set(&root->refs, 1);
 	atomic_set(&root->will_be_snapshotted, 0);
+	atomic_set(&root->nr_swapfiles, 0);
 	root->log_transid = 0;
 	root->log_transid_committed = -1;
 	root->last_log_commit = 0;
@@ -2781,6 +2782,9 @@ int open_ctree(struct super_block *sb,
 	fs_info->sectorsize = 4096;
 	fs_info->stripesize = 4096;
 
+	spin_lock_init(&fs_info->swapfile_pins_lock);
+	fs_info->swapfile_pins = RB_ROOT;
+
 	ret = btrfs_alloc_stripe_hash_table(fs_info);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 63600dc2ac4c..d083bf21aabe 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -290,6 +290,11 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
 	} else if (fsflags & FS_COMPR_FL) {
 		const char *comp;
 
+		if (IS_SWAPFILE(inode)) {
+			ret = -ETXTBSY;
+			goto out_unlock;
+		}
+
 		binode->flags |= BTRFS_INODE_COMPRESS;
 		binode->flags &= ~BTRFS_INODE_NOCOMPRESS;
 
@@ -751,6 +756,12 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
 	if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 		return -EINVAL;
 
+	if (atomic_read(&root->nr_swapfiles)) {
+		btrfs_info(fs_info,
+			   "cannot snapshot subvolume with active swapfile");
+		return -ETXTBSY;
+	}
+
 	pending_snapshot = kzalloc(sizeof(*pending_snapshot), GFP_KERNEL);
 	if (!pending_snapshot)
 		return -ENOMEM;
@@ -1487,9 +1498,13 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
 		}
 
 		inode_lock(inode);
-		if (do_compress)
-			BTRFS_I(inode)->defrag_compress = compress_type;
-		ret = cluster_pages_for_defrag(inode, pages, i, cluster);
+		if (IS_SWAPFILE(inode)) {
+			ret = -ETXTBSY;
+		} else {
+			if (do_compress)
+				BTRFS_I(inode)->defrag_compress = compress_type;
+			ret = cluster_pages_for_defrag(inode, pages, i, cluster);
+		}
 		if (ret < 0) {
 			inode_unlock(inode);
 			goto out_ra;
@@ -3538,6 +3553,11 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 		goto out_unlock;
 	}
 
+	if (IS_SWAPFILE(src) || IS_SWAPFILE(dst)) {
+		ret = -ETXTBSY;
+		goto out_unlock;
+	}
+
 	tail_len = olen % BTRFS_MAX_DEDUPE_LEN;
 	chunk_count = div_u64(olen, BTRFS_MAX_DEDUPE_LEN);
 	if (chunk_count == 0)
@@ -4234,6 +4254,11 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
 		goto out_unlock;
 	}
 
+	if (IS_SWAPFILE(src) || IS_SWAPFILE(inode)) {
+		ret = -ETXTBSY;
+		goto out_unlock;
+	}
+
 	/* determine range to clone */
 	ret = -EINVAL;
 	if (off + len > src->i_size || off + len < off)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 8783a1776540..7468a0f55cd2 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4226,6 +4226,7 @@ static void describe_relocation(struct btrfs_fs_info *fs_info,
  */
 int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
 {
+	struct btrfs_block_group_cache *bg;
 	struct btrfs_root *extent_root = fs_info->extent_root;
 	struct reloc_control *rc;
 	struct inode *inode;
@@ -4234,14 +4235,23 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
 	int rw = 0;
 	int err = 0;
 
+	bg = btrfs_lookup_block_group(fs_info, group_start);
+	if (!bg)
+		return -ENOENT;
+
+	if (btrfs_pinned_by_swapfile(fs_info, bg)) {
+		btrfs_put_block_group(bg);
+		return -ETXTBSY;
+	}
+
 	rc = alloc_reloc_control();
-	if (!rc)
+	if (!rc) {
+		btrfs_put_block_group(bg);
 		return -ENOMEM;
+	}
 
 	rc->extent_root = extent_root;
-
-	rc->block_group = btrfs_lookup_block_group(fs_info, group_start);
-	BUG_ON(!rc->block_group);
+	rc->block_group = bg;
 
 	ret = btrfs_inc_block_group_ro(rc->block_group);
 	if (ret) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index da86706123ff..207e36b70d9b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1882,6 +1882,14 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	if (ret)
 		goto out;
 
+	if (btrfs_pinned_by_swapfile(fs_info, device)) {
+		btrfs_info_in_rcu(fs_info,
+				  "cannot remove device %s (devid %llu) due to active swapfile",
+				  rcu_str_deref(device->name), device->devid);
+		ret = -ETXTBSY;
+		goto out;
+	}
+
 	if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
 		ret = BTRFS_ERROR_DEV_TGT_REPLACE;
 		goto out;
@@ -3626,10 +3634,15 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
 
 		ret = btrfs_relocate_chunk(fs_info, found_key.offset);
 		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
-		if (ret && ret != -ENOSPC)
-			goto error;
 		if (ret == -ENOSPC) {
 			enospc_errors++;
+		} else if (ret == -ETXTBSY) {
+			btrfs_info(fs_info,
+				   "skipping relocation of block group %llu due to active swapfile",
+				   found_key.offset);
+			ret = 0;
+		} else if (ret) {
+			goto error;
 		} else {
 			spin_lock(&fs_info->balance_lock);
 			bctl->stat.completed++;
@@ -4426,10 +4439,16 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
 
 		ret = btrfs_relocate_chunk(fs_info, chunk_offset);
 		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
-		if (ret && ret != -ENOSPC)
-			goto done;
-		if (ret == -ENOSPC)
+		if (ret == -ENOSPC) {
 			failed++;
+		} else {
+			if (ret == -ETXTBSY) {
+				btrfs_info(fs_info,
+					   "could not shrink block group %llu due to active swapfile",
+					   chunk_offset);
+			}
+			goto done;
+		}
 	} while (key.offset-- > 0);
 
 	if (failed && !retried) {
@@ -7525,3 +7544,23 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 	btrfs_free_path(path);
 	return ret;
 }
+
+bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr)
+{
+	struct btrfs_swapfile_pin *sp;
+	struct rb_node *node;
+
+	spin_lock(&fs_info->swapfile_pins_lock);
+	node = fs_info->swapfile_pins.rb_node;
+	while (node) {
+		sp = rb_entry(node, struct btrfs_swapfile_pin, node);
+		if (ptr < sp->ptr)
+			node = node->rb_left;
+		else if (ptr > sp->ptr)
+			node = node->rb_right;
+		else
+			break;
+	}
+	spin_unlock(&fs_info->swapfile_pins_lock);
+	return node != NULL;
+}
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 5/6] Btrfs: rename get_chunk_map() and make it non-static
  2018-09-07  7:39 [PATCH v6 0/6] Btrfs: implement swap file support Omar Sandoval
                   ` (3 preceding siblings ...)
  2018-09-07  7:39 ` [PATCH v6 4/6] Btrfs: prevent ioctls from interfering with a swap file Omar Sandoval
@ 2018-09-07  7:39 ` Omar Sandoval
  2018-09-07  7:54   ` Nikolay Borisov
  2018-09-07  7:39 ` [PATCH v6 6/6] Btrfs: support swap files Omar Sandoval
  5 siblings, 1 reply; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, linux-mm

From: Omar Sandoval <osandov@fb.com>

The Btrfs swap code is going to need it, so give it a btrfs_ prefix and
make it non-static.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/volumes.c | 22 +++++++++++-----------
 fs/btrfs/volumes.h |  9 +++++++++
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 207e36b70d9b..514932c47bcd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2714,8 +2714,8 @@ static int btrfs_del_sys_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	return ret;
 }
 
-static struct extent_map *get_chunk_map(struct btrfs_fs_info *fs_info,
-					u64 logical, u64 length)
+struct extent_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
+				       u64 logical, u64 length)
 {
 	struct extent_map_tree *em_tree;
 	struct extent_map *em;
@@ -2752,7 +2752,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
 	int i, ret = 0;
 	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 
-	em = get_chunk_map(fs_info, chunk_offset, 1);
+	em = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
 	if (IS_ERR(em)) {
 		/*
 		 * This is a logic error, but we don't want to just rely on the
@@ -4897,7 +4897,7 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
 	int i = 0;
 	int ret = 0;
 
-	em = get_chunk_map(fs_info, chunk_offset, chunk_size);
+	em = btrfs_get_chunk_map(fs_info, chunk_offset, chunk_size);
 	if (IS_ERR(em))
 		return PTR_ERR(em);
 
@@ -5039,7 +5039,7 @@ int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset)
 	int miss_ndevs = 0;
 	int i;
 
-	em = get_chunk_map(fs_info, chunk_offset, 1);
+	em = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
 	if (IS_ERR(em))
 		return 1;
 
@@ -5099,7 +5099,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	struct map_lookup *map;
 	int ret;
 
-	em = get_chunk_map(fs_info, logical, len);
+	em = btrfs_get_chunk_map(fs_info, logical, len);
 	if (IS_ERR(em))
 		/*
 		 * We could return errors for these cases, but that could get
@@ -5145,7 +5145,7 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
 	struct map_lookup *map;
 	unsigned long len = fs_info->sectorsize;
 
-	em = get_chunk_map(fs_info, logical, len);
+	em = btrfs_get_chunk_map(fs_info, logical, len);
 
 	if (!WARN_ON(IS_ERR(em))) {
 		map = em->map_lookup;
@@ -5162,7 +5162,7 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	struct map_lookup *map;
 	int ret = 0;
 
-	em = get_chunk_map(fs_info, logical, len);
+	em = btrfs_get_chunk_map(fs_info, logical, len);
 
 	if(!WARN_ON(IS_ERR(em))) {
 		map = em->map_lookup;
@@ -5321,7 +5321,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info,
 	/* discard always return a bbio */
 	ASSERT(bbio_ret);
 
-	em = get_chunk_map(fs_info, logical, length);
+	em = btrfs_get_chunk_map(fs_info, logical, length);
 	if (IS_ERR(em))
 		return PTR_ERR(em);
 
@@ -5647,7 +5647,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		return __btrfs_map_block_for_discard(fs_info, logical,
 						     *length, bbio_ret);
 
-	em = get_chunk_map(fs_info, logical, *length);
+	em = btrfs_get_chunk_map(fs_info, logical, *length);
 	if (IS_ERR(em))
 		return PTR_ERR(em);
 
@@ -5946,7 +5946,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start,
 	u64 rmap_len;
 	int i, j, nr = 0;
 
-	em = get_chunk_map(fs_info, chunk_start, 1);
+	em = btrfs_get_chunk_map(fs_info, chunk_start, 1);
 	if (IS_ERR(em))
 		return -EIO;
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 23e9285d88de..d68c8a05a774 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -465,6 +465,15 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
 int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
 			     u64 chunk_offset, u64 chunk_size);
 int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
+/**
+ * btrfs_get_chunk_map() - Find the mapping containing the given logical extent.
+ * @logical: Logical block offset in bytes.
+ * @length: Length of extent in bytes.
+ *
+ * Return: Chunk mapping or ERR_PTR.
+ */
+struct extent_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
+				       u64 logical, u64 length);
 
 static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
 				      int index)
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 6/6] Btrfs: support swap files
  2018-09-07  7:39 [PATCH v6 0/6] Btrfs: implement swap file support Omar Sandoval
                   ` (4 preceding siblings ...)
  2018-09-07  7:39 ` [PATCH v6 5/6] Btrfs: rename get_chunk_map() and make it non-static Omar Sandoval
@ 2018-09-07  7:39 ` Omar Sandoval
  2018-09-07  8:39   ` Nikolay Borisov
  5 siblings, 1 reply; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07  7:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, linux-mm

From: Omar Sandoval <osandov@fb.com>

Implement the swap file a_ops on Btrfs. Activation needs to make sure
that the file can be used as a swap file, which currently means it must
be fully allocated as nocow with no compression on one device. It must
also do the proper tracking so that ioctls will not interfere with the
swap file. Deactivation clears this tracking.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 316 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 316 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9357a19d2bff..55aba2d7074c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -27,6 +27,7 @@
 #include <linux/uio.h>
 #include <linux/magic.h>
 #include <linux/iversion.h>
+#include <linux/swap.h>
 #include <asm/unaligned.h>
 #include "ctree.h"
 #include "disk-io.h"
@@ -10437,6 +10438,319 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end)
 	}
 }
 
+/*
+ * Add an entry indicating a block group or device which is pinned by a
+ * swapfile. Returns 0 on success, 1 if there is already an entry for it, or a
+ * negative errno on failure.
+ */
+static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr,
+				  bool is_block_group)
+{
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	struct btrfs_swapfile_pin *sp, *entry;
+	struct rb_node **p;
+	struct rb_node *parent = NULL;
+
+	sp = kmalloc(sizeof(*sp), GFP_NOFS);
+	if (!sp)
+		return -ENOMEM;
+	sp->ptr = ptr;
+	sp->inode = inode;
+	sp->is_block_group = is_block_group;
+
+	spin_lock(&fs_info->swapfile_pins_lock);
+	p = &fs_info->swapfile_pins.rb_node;
+	while (*p) {
+		parent = *p;
+		entry = rb_entry(parent, struct btrfs_swapfile_pin, node);
+		if (sp->ptr < entry->ptr ||
+		    (sp->ptr == entry->ptr && sp->inode < entry->inode)) {
+			p = &(*p)->rb_left;
+		} else if (sp->ptr > entry->ptr ||
+			   (sp->ptr == entry->ptr && sp->inode > entry->inode)) {
+			p = &(*p)->rb_right;
+		} else {
+			spin_unlock(&fs_info->swapfile_pins_lock);
+			kfree(sp);
+			return 1;
+		}
+	}
+	rb_link_node(&sp->node, parent, p);
+	rb_insert_color(&sp->node, &fs_info->swapfile_pins);
+	spin_unlock(&fs_info->swapfile_pins_lock);
+	return 0;
+}
+
+static void btrfs_free_swapfile_pins(struct inode *inode)
+{
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	struct btrfs_swapfile_pin *sp;
+	struct rb_node *node, *next;
+
+	spin_lock(&fs_info->swapfile_pins_lock);
+	node = rb_first(&fs_info->swapfile_pins);
+	while (node) {
+		next = rb_next(node);
+		sp = rb_entry(node, struct btrfs_swapfile_pin, node);
+		if (sp->inode == inode) {
+			rb_erase(&sp->node, &fs_info->swapfile_pins);
+			if (sp->is_block_group)
+				btrfs_put_block_group(sp->ptr);
+			kfree(sp);
+		}
+		node = next;
+	}
+	spin_unlock(&fs_info->swapfile_pins_lock);
+}
+
+struct btrfs_swap_info {
+	u64 start;
+	u64 block_start;
+	u64 block_len;
+	u64 lowest_ppage;
+	u64 highest_ppage;
+	unsigned long nr_pages;
+	int nr_extents;
+};
+
+static int btrfs_add_swap_extent(struct swap_info_struct *sis,
+				 struct btrfs_swap_info *bsi)
+{
+	unsigned long nr_pages;
+	u64 first_ppage, first_ppage_reported, next_ppage;
+	int ret;
+
+	first_ppage = ALIGN(bsi->block_start, PAGE_SIZE) >> PAGE_SHIFT;
+	next_ppage = ALIGN_DOWN(bsi->block_start + bsi->block_len,
+				PAGE_SIZE) >> PAGE_SHIFT;
+
+	if (first_ppage >= next_ppage)
+		return 0;
+	nr_pages = next_ppage - first_ppage;
+
+	first_ppage_reported = first_ppage;
+	if (bsi->start == 0)
+		first_ppage_reported++;
+	if (bsi->lowest_ppage > first_ppage_reported)
+		bsi->lowest_ppage = first_ppage_reported;
+	if (bsi->highest_ppage < (next_ppage - 1))
+		bsi->highest_ppage = next_ppage - 1;
+
+	ret = add_swap_extent(sis, bsi->nr_pages, nr_pages, first_ppage);
+	if (ret < 0)
+		return ret;
+	bsi->nr_extents += ret;
+	bsi->nr_pages += nr_pages;
+	return 0;
+}
+
+static void btrfs_swap_deactivate(struct file *file)
+{
+	struct inode *inode = file_inode(file);
+
+	btrfs_free_swapfile_pins(inode);
+	atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
+}
+
+static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
+			       sector_t *span)
+{
+	struct inode *inode = file_inode(file);
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+	struct extent_state *cached_state = NULL;
+	struct extent_map *em = NULL;
+	struct btrfs_device *device = NULL;
+	struct btrfs_swap_info bsi = {
+		.lowest_ppage = (sector_t)-1ULL,
+	};
+	int ret = 0;
+	u64 isize = inode->i_size;
+	u64 start;
+
+	/*
+	 * If the swap file was just created, make sure delalloc is done. If the
+	 * file changes again after this, the user is doing something stupid and
+	 * we don't really care.
+	 */
+	ret = btrfs_wait_ordered_range(inode, 0, (u64)-1);
+	if (ret)
+		return ret;
+
+	/*
+	 * The inode is locked, so these flags won't change after we check them.
+	 */
+	if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) {
+		btrfs_info(fs_info, "swapfile must not be compressed");
+		return -EINVAL;
+	}
+	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) {
+		btrfs_info(fs_info, "swapfile must not be copy-on-write");
+		return -EINVAL;
+	}
+
+	/*
+	 * Balance or device remove/replace/resize can move stuff around from
+	 * under us. The EXCL_OP flag makes sure they aren't running/won't run
+	 * concurrently while we are mapping the swap extents, and
+	 * fs_info->swapfile_pins prevents them from running while the swap file
+	 * is active and moving the extents. Note that this also prevents a
+	 * concurrent device add which isn't actually necessary, but it's not
+	 * really worth the trouble to allow it.
+	 */
+	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags))
+		return -EBUSY;
+	/*
+	 * Snapshots can create extents which require COW even if NODATACOW is
+	 * set. We use this counter to prevent snapshots. We must increment it
+	 * before walking the extents because we don't want a concurrent
+	 * snapshot to run after we've already checked the extents.
+	 */
+	atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles);
+
+	lock_extent_bits(io_tree, 0, isize - 1, &cached_state);
+	start = 0;
+	while (start < isize) {
+		u64 end, logical_block_start, physical_block_start;
+		struct btrfs_block_group_cache *bg;
+		u64 len = isize - start;
+
+		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0);
+		if (IS_ERR(em)) {
+			ret = PTR_ERR(em);
+			goto out;
+		}
+		end = extent_map_end(em);
+
+		if (em->block_start == EXTENT_MAP_HOLE) {
+			btrfs_info(fs_info, "swapfile must not have holes");
+			ret = -EINVAL;
+			goto out;
+		}
+		if (em->block_start == EXTENT_MAP_INLINE) {
+			/*
+			 * It's unlikely we'll ever actually find ourselves
+			 * here, as a file small enough to fit inline won't be
+			 * big enough to store more than the swap header, but in
+			 * case something changes in the future, let's catch it
+			 * here rather than later.
+			 */
+			btrfs_info(fs_info, "swapfile must not be inline");
+			ret = -EINVAL;
+			goto out;
+		}
+		if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
+			btrfs_info(fs_info, "swapfile must not be compressed");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		logical_block_start = em->block_start + (start - em->start);
+		len = min(len, em->len - (start - em->start));
+		free_extent_map(em);
+		em = NULL;
+
+		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL);
+		if (ret < 0) {
+			goto out;
+		} else if (ret) {
+			ret = 0;
+		} else {
+			btrfs_info(fs_info, "swapfile must not be copy-on-write");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		em = btrfs_get_chunk_map(fs_info, logical_block_start, len);
+		if (IS_ERR(em)) {
+			ret = PTR_ERR(em);
+			goto out;
+		}
+
+		if (em->map_lookup->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+			btrfs_info(fs_info, "swapfile must have single data profile");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		if (device == NULL) {
+			device = em->map_lookup->stripes[0].dev;
+			ret = btrfs_add_swapfile_pin(inode, device, false);
+			if (ret == 1)
+				ret = 0;
+			else if (ret)
+				goto out;
+		} else if (device != em->map_lookup->stripes[0].dev) {
+			btrfs_info(fs_info, "swapfile must be on one device");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		physical_block_start = (em->map_lookup->stripes[0].physical +
+					(logical_block_start - em->start));
+		len = min(len, em->len - (logical_block_start - em->start));
+		free_extent_map(em);
+		em = NULL;
+
+		bg = btrfs_lookup_block_group(fs_info, logical_block_start);
+		if (!bg) {
+			btrfs_info(fs_info, "could not find block group containing swapfile");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		ret = btrfs_add_swapfile_pin(inode, bg, true);
+		if (ret) {
+			btrfs_put_block_group(bg);
+			if (ret == 1)
+				ret = 0;
+			else
+				goto out;
+		}
+
+		if (bsi.block_len &&
+		    bsi.block_start + bsi.block_len == physical_block_start) {
+			bsi.block_len += len;
+		} else {
+			if (bsi.block_len) {
+				ret = btrfs_add_swap_extent(sis, &bsi);
+				if (ret)
+					goto out;
+			}
+			bsi.start = start;
+			bsi.block_start = physical_block_start;
+			bsi.block_len = len;
+		}
+
+		start = end;
+	}
+
+	if (bsi.block_len)
+		ret = btrfs_add_swap_extent(sis, &bsi);
+
+out:
+	if (!IS_ERR_OR_NULL(em))
+		free_extent_map(em);
+
+	unlock_extent_cached(io_tree, 0, isize - 1, &cached_state);
+
+	if (ret)
+		btrfs_swap_deactivate(file);
+
+	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
+
+	if (ret)
+		return ret;
+
+	if (device)
+		sis->bdev = device->bdev;
+	*span = bsi.highest_ppage - bsi.lowest_ppage + 1;
+	sis->max = bsi.nr_pages;
+	sis->pages = bsi.nr_pages - 1;
+	sis->highest_bit = bsi.nr_pages - 1;
+	return bsi.nr_extents;
+}
+
 static const struct inode_operations btrfs_dir_inode_operations = {
 	.getattr	= btrfs_getattr,
 	.lookup		= btrfs_lookup,
@@ -10514,6 +10828,8 @@ static const struct address_space_operations btrfs_aops = {
 	.releasepage	= btrfs_releasepage,
 	.set_page_dirty	= btrfs_set_page_dirty,
 	.error_remove_page = generic_error_remove_page,
+	.swap_activate	= btrfs_swap_activate,
+	.swap_deactivate = btrfs_swap_deactivate,
 };
 
 static const struct address_space_operations btrfs_symlink_aops = {
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 5/6] Btrfs: rename get_chunk_map() and make it non-static
  2018-09-07  7:39 ` [PATCH v6 5/6] Btrfs: rename get_chunk_map() and make it non-static Omar Sandoval
@ 2018-09-07  7:54   ` Nikolay Borisov
  2018-09-07 10:26     ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2018-09-07  7:54 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, linux-mm



On  7.09.2018 10:39, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> The Btrfs swap code is going to need it, so give it a btrfs_ prefix and
> make it non-static.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

One minor nit but otherwise:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/volumes.c | 22 +++++++++++-----------
>  fs/btrfs/volumes.h |  9 +++++++++
>  2 files changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 207e36b70d9b..514932c47bcd 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2714,8 +2714,8 @@ static int btrfs_del_sys_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>  	return ret;
>  }
>  
> -static struct extent_map *get_chunk_map(struct btrfs_fs_info *fs_info,
> -					u64 logical, u64 length)
> +struct extent_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
> +				       u64 logical, u64 length)
>  {
>  	struct extent_map_tree *em_tree;
>  	struct extent_map *em;
> @@ -2752,7 +2752,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>  	int i, ret = 0;
>  	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>  
> -	em = get_chunk_map(fs_info, chunk_offset, 1);
> +	em = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
>  	if (IS_ERR(em)) {
>  		/*
>  		 * This is a logic error, but we don't want to just rely on the
> @@ -4897,7 +4897,7 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
>  	int i = 0;
>  	int ret = 0;
>  
> -	em = get_chunk_map(fs_info, chunk_offset, chunk_size);
> +	em = btrfs_get_chunk_map(fs_info, chunk_offset, chunk_size);
>  	if (IS_ERR(em))
>  		return PTR_ERR(em);
>  
> @@ -5039,7 +5039,7 @@ int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>  	int miss_ndevs = 0;
>  	int i;
>  
> -	em = get_chunk_map(fs_info, chunk_offset, 1);
> +	em = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
>  	if (IS_ERR(em))
>  		return 1;
>  
> @@ -5099,7 +5099,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>  	struct map_lookup *map;
>  	int ret;
>  
> -	em = get_chunk_map(fs_info, logical, len);
> +	em = btrfs_get_chunk_map(fs_info, logical, len);
>  	if (IS_ERR(em))
>  		/*
>  		 * We could return errors for these cases, but that could get
> @@ -5145,7 +5145,7 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
>  	struct map_lookup *map;
>  	unsigned long len = fs_info->sectorsize;
>  
> -	em = get_chunk_map(fs_info, logical, len);
> +	em = btrfs_get_chunk_map(fs_info, logical, len);
>  
>  	if (!WARN_ON(IS_ERR(em))) {
>  		map = em->map_lookup;
> @@ -5162,7 +5162,7 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>  	struct map_lookup *map;
>  	int ret = 0;
>  
> -	em = get_chunk_map(fs_info, logical, len);
> +	em = btrfs_get_chunk_map(fs_info, logical, len);
>  
>  	if(!WARN_ON(IS_ERR(em))) {
>  		map = em->map_lookup;
> @@ -5321,7 +5321,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info,
>  	/* discard always return a bbio */
>  	ASSERT(bbio_ret);
>  
> -	em = get_chunk_map(fs_info, logical, length);
> +	em = btrfs_get_chunk_map(fs_info, logical, length);
>  	if (IS_ERR(em))
>  		return PTR_ERR(em);
>  
> @@ -5647,7 +5647,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>  		return __btrfs_map_block_for_discard(fs_info, logical,
>  						     *length, bbio_ret);
>  
> -	em = get_chunk_map(fs_info, logical, *length);
> +	em = btrfs_get_chunk_map(fs_info, logical, *length);
>  	if (IS_ERR(em))
>  		return PTR_ERR(em);
>  
> @@ -5946,7 +5946,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start,
>  	u64 rmap_len;
>  	int i, j, nr = 0;
>  
> -	em = get_chunk_map(fs_info, chunk_start, 1);
> +	em = btrfs_get_chunk_map(fs_info, chunk_start, 1);
>  	if (IS_ERR(em))
>  		return -EIO;
>  
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 23e9285d88de..d68c8a05a774 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -465,6 +465,15 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
>  int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
>  			     u64 chunk_offset, u64 chunk_size);
>  int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
> +/**
> + * btrfs_get_chunk_map() - Find the mapping containing the given logical extent.
> + * @logical: Logical block offset in bytes.
> + * @length: Length of extent in bytes.
> + *
> + * Return: Chunk mapping or ERR_PTR.
> + */

nit: I think the accepted style is to have the documentation before the
definition and not declaration of the function. I guess David can fix
that when the patch is being merged.

> +struct extent_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
> +				       u64 logical, u64 length);
>  
>  static inline void btrfs_dev_stat_inc(struct btrfs_device *dev,
>  				      int index)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 6/6] Btrfs: support swap files
  2018-09-07  7:39 ` [PATCH v6 6/6] Btrfs: support swap files Omar Sandoval
@ 2018-09-07  8:39   ` Nikolay Borisov
  2018-09-07 18:04     ` Omar Sandoval
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2018-09-07  8:39 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, linux-mm



On  7.09.2018 10:39, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Implement the swap file a_ops on Btrfs. Activation needs to make sure
> that the file can be used as a swap file, which currently means it must
> be fully allocated as nocow with no compression on one device. It must
> also do the proper tracking so that ioctls will not interfere with the
> swap file. Deactivation clears this tracking.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/inode.c | 316 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 316 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 9357a19d2bff..55aba2d7074c 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -27,6 +27,7 @@
>  #include <linux/uio.h>
>  #include <linux/magic.h>
>  #include <linux/iversion.h>
> +#include <linux/swap.h>
>  #include <asm/unaligned.h>
>  #include "ctree.h"
>  #include "disk-io.h"
> @@ -10437,6 +10438,319 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end)
>  	}
>  }
>  
> +/*
> + * Add an entry indicating a block group or device which is pinned by a
> + * swapfile. Returns 0 on success, 1 if there is already an entry for it, or a
> + * negative errno on failure.
> + */
> +static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr,
> +				  bool is_block_group)
> +{
> +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> +	struct btrfs_swapfile_pin *sp, *entry;
> +	struct rb_node **p;
> +	struct rb_node *parent = NULL;
> +
> +	sp = kmalloc(sizeof(*sp), GFP_NOFS);
> +	if (!sp)
> +		return -ENOMEM;
> +	sp->ptr = ptr;
> +	sp->inode = inode;
> +	sp->is_block_group = is_block_group;
> +
> +	spin_lock(&fs_info->swapfile_pins_lock);
> +	p = &fs_info->swapfile_pins.rb_node;
> +	while (*p) {
> +		parent = *p;
> +		entry = rb_entry(parent, struct btrfs_swapfile_pin, node);
> +		if (sp->ptr < entry->ptr ||
> +		    (sp->ptr == entry->ptr && sp->inode < entry->inode)) {
> +			p = &(*p)->rb_left;
> +		} else if (sp->ptr > entry->ptr ||
> +			   (sp->ptr == entry->ptr && sp->inode > entry->inode)) {
> +			p = &(*p)->rb_right;
> +		} else {
> +			spin_unlock(&fs_info->swapfile_pins_lock);
> +			kfree(sp);
> +			return 1;
> +		}


I have to admit this is creative use of pointers but I dislike it:

1. You are not really doing an interval tree of any sorts so rb seems a
bit of an overkill. How many block groups/devices do you expect to have
in the rb tree i.e how many swap files per file system so that the logn
search behavior really matter? Why not a simple linked list and just an
equality comparison of pointers?

2. The code self-admits that using pointers for lt/gr comparison is a
hack since in case pointers match you fall back to checking the inode
pointer

3. There is a discrepancy between the keys used for adding (ptr + inode)
and deletion (just inode)


At the very  least this hack needs to be at least mentioned in the
changelog.

> +	}
> +	rb_link_node(&sp->node, parent, p);
> +	rb_insert_color(&sp->node, &fs_info->swapfile_pins);
> +	spin_unlock(&fs_info->swapfile_pins_lock);
> +	return 0;
> +}
> +
> +static void btrfs_free_swapfile_pins(struct inode *inode)
> +{
> +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> +	struct btrfs_swapfile_pin *sp;
> +	struct rb_node *node, *next;
> +
> +	spin_lock(&fs_info->swapfile_pins_lock);
> +	node = rb_first(&fs_info->swapfile_pins);
> +	while (node) {
> +		next = rb_next(node);
> +		sp = rb_entry(node, struct btrfs_swapfile_pin, node);
> +		if (sp->inode == inode) {
> +			rb_erase(&sp->node, &fs_info->swapfile_pins);
> +			if (sp->is_block_group)
> +				btrfs_put_block_group(sp->ptr);
> +			kfree(sp);
> +		}
> +		node = next;
> +	}
> +	spin_unlock(&fs_info->swapfile_pins_lock);
> +}
> +
> +struct btrfs_swap_info {
> +	u64 start;
> +	u64 block_start;
> +	u64 block_len;
> +	u64 lowest_ppage;
> +	u64 highest_ppage;
> +	unsigned long nr_pages;
> +	int nr_extents;
> +};
> +
> +static int btrfs_add_swap_extent(struct swap_info_struct *sis,
> +				 struct btrfs_swap_info *bsi)
> +{
> +	unsigned long nr_pages;
> +	u64 first_ppage, first_ppage_reported, next_ppage;

what the does extra p stand for - physical?

> +	int ret;
> +
> +	first_ppage = ALIGN(bsi->block_start, PAGE_SIZE) >> PAGE_SHIFT;
> +	next_ppage = ALIGN_DOWN(bsi->block_start + bsi->block_len,
> +				PAGE_SIZE) >> PAGE_SHIFT;
> +
> +	if (first_ppage >= next_ppage)
> +		return 0;
> +	nr_pages = next_ppage - first_ppage;
> +
> +	first_ppage_reported = first_ppage;
> +	if (bsi->start == 0)
> +		first_ppage_reported++;
> +	if (bsi->lowest_ppage > first_ppage_reported)
> +		bsi->lowest_ppage = first_ppage_reported;
> +	if (bsi->highest_ppage < (next_ppage - 1))
> +		bsi->highest_ppage = next_ppage - 1;
> +
> +	ret = add_swap_extent(sis, bsi->nr_pages, nr_pages, first_ppage);
> +	if (ret < 0)
> +		return ret;
> +	bsi->nr_extents += ret;
> +	bsi->nr_pages += nr_pages;
> +	return 0;
> +}
> +
> +static void btrfs_swap_deactivate(struct file *file)
> +{
> +	struct inode *inode = file_inode(file);
> +
> +	btrfs_free_swapfile_pins(inode);
> +	atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
> +}
> +
> +static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
> +			       sector_t *span)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> +	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> +	struct extent_state *cached_state = NULL;
> +	struct extent_map *em = NULL;
> +	struct btrfs_device *device = NULL;
> +	struct btrfs_swap_info bsi = {
> +		.lowest_ppage = (sector_t)-1ULL,
> +	};
> +	int ret = 0;
> +	u64 isize = inode->i_size;
> +	u64 start;
> +
> +	/*
> +	 * If the swap file was just created, make sure delalloc is done. If the
> +	 * file changes again after this, the user is doing something stupid and
> +	 * we don't really care.
> +	 */
> +	ret = btrfs_wait_ordered_range(inode, 0, (u64)-1);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * The inode is locked, so these flags won't change after we check them.
> +	 */
> +	if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) {
> +		btrfs_info(fs_info, "swapfile must not be compressed");
> +		return -EINVAL;
> +	}
> +	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) {
> +		btrfs_info(fs_info, "swapfile must not be copy-on-write");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Balance or device remove/replace/resize can move stuff around from
> +	 * under us. The EXCL_OP flag makes sure they aren't running/won't run
> +	 * concurrently while we are mapping the swap extents, and
> +	 * fs_info->swapfile_pins prevents them from running while the swap file
> +	 * is active and moving the extents. Note that this also prevents a
> +	 * concurrent device add which isn't actually necessary, but it's not
> +	 * really worth the trouble to allow it.
> +	 */
> +	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags))
> +		return -EBUSY;
> +	/*
> +	 * Snapshots can create extents which require COW even if NODATACOW is
> +	 * set. We use this counter to prevent snapshots. We must increment it
> +	 * before walking the extents because we don't want a concurrent
> +	 * snapshot to run after we've already checked the extents.
> +	 */
> +	atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles);
> +
> +	lock_extent_bits(io_tree, 0, isize - 1, &cached_state);
> +	start = 0;
> +	while (start < isize) {
> +		u64 end, logical_block_start, physical_block_start;
> +		struct btrfs_block_group_cache *bg;
> +		u64 len = isize - start;
> +
> +		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0);
> +		if (IS_ERR(em)) {
> +			ret = PTR_ERR(em);
> +			goto out;
> +		}
> +		end = extent_map_end(em);
> +
> +		if (em->block_start == EXTENT_MAP_HOLE) {
> +			btrfs_info(fs_info, "swapfile must not have holes");
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		if (em->block_start == EXTENT_MAP_INLINE) {
> +			/*
> +			 * It's unlikely we'll ever actually find ourselves
> +			 * here, as a file small enough to fit inline won't be
> +			 * big enough to store more than the swap header, but in
> +			 * case something changes in the future, let's catch it
> +			 * here rather than later.
> +			 */
> +			btrfs_info(fs_info, "swapfile must not be inline");
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
> +			btrfs_info(fs_info, "swapfile must not be compressed");
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		logical_block_start = em->block_start + (start - em->start);
> +		len = min(len, em->len - (start - em->start));
> +		free_extent_map(em);
> +		em = NULL;
> +
> +		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL);
> +		if (ret < 0) {
> +			goto out;
> +		} else if (ret) {
> +			ret = 0;
> +		} else {
> +			btrfs_info(fs_info, "swapfile must not be copy-on-write");
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		em = btrfs_get_chunk_map(fs_info, logical_block_start, len);
> +		if (IS_ERR(em)) {
> +			ret = PTR_ERR(em);
> +			goto out;
> +		}
> +
> +		if (em->map_lookup->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> +			btrfs_info(fs_info, "swapfile must have single data profile");
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		if (device == NULL) {
> +			device = em->map_lookup->stripes[0].dev;
> +			ret = btrfs_add_swapfile_pin(inode, device, false);
> +			if (ret == 1)
> +				ret = 0;
> +			else if (ret)
> +				goto out;
> +		} else if (device != em->map_lookup->stripes[0].dev) {
> +			btrfs_info(fs_info, "swapfile must be on one device");
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		physical_block_start = (em->map_lookup->stripes[0].physical +
> +					(logical_block_start - em->start));
> +		len = min(len, em->len - (logical_block_start - em->start));
> +		free_extent_map(em);
> +		em = NULL;
> +
> +		bg = btrfs_lookup_block_group(fs_info, logical_block_start);
> +		if (!bg) {
> +			btrfs_info(fs_info, "could not find block group containing swapfile");
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		ret = btrfs_add_swapfile_pin(inode, bg, true);
> +		if (ret) {
> +			btrfs_put_block_group(bg);
> +			if (ret == 1)
> +				ret = 0;
> +			else
> +				goto out;
> +		}
> +
> +		if (bsi.block_len &&
> +		    bsi.block_start + bsi.block_len == physical_block_start) {
> +			bsi.block_len += len;
> +		} else {
> +			if (bsi.block_len) {
> +				ret = btrfs_add_swap_extent(sis, &bsi);
> +				if (ret)
> +					goto out;
> +			}
> +			bsi.start = start;
> +			bsi.block_start = physical_block_start;
> +			bsi.block_len = len;
> +		}
> +
> +		start = end;
> +	}
> +
> +	if (bsi.block_len)
> +		ret = btrfs_add_swap_extent(sis, &bsi);
> +
> +out:
> +	if (!IS_ERR_OR_NULL(em))
> +		free_extent_map(em);
> +
> +	unlock_extent_cached(io_tree, 0, isize - 1, &cached_state);
> +
> +	if (ret)
> +		btrfs_swap_deactivate(file);
> +
> +	clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags);
> +
> +	if (ret)
> +		return ret;
> +
> +	if (device)
> +		sis->bdev = device->bdev;
> +	*span = bsi.highest_ppage - bsi.lowest_ppage + 1;
> +	sis->max = bsi.nr_pages;
> +	sis->pages = bsi.nr_pages - 1;
> +	sis->highest_bit = bsi.nr_pages - 1;
> +	return bsi.nr_extents;
> +}
> +
>  static const struct inode_operations btrfs_dir_inode_operations = {
>  	.getattr	= btrfs_getattr,
>  	.lookup		= btrfs_lookup,
> @@ -10514,6 +10828,8 @@ static const struct address_space_operations btrfs_aops = {
>  	.releasepage	= btrfs_releasepage,
>  	.set_page_dirty	= btrfs_set_page_dirty,
>  	.error_remove_page = generic_error_remove_page,
> +	.swap_activate	= btrfs_swap_activate,
> +	.swap_deactivate = btrfs_swap_deactivate,
>  };
>  
>  static const struct address_space_operations btrfs_symlink_aops = {
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 5/6] Btrfs: rename get_chunk_map() and make it non-static
  2018-09-07  7:54   ` Nikolay Borisov
@ 2018-09-07 10:26     ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2018-09-07 10:26 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Omar Sandoval, linux-btrfs, kernel-team, linux-mm

On Fri, Sep 07, 2018 at 10:54:52AM +0300, Nikolay Borisov wrote:
> 
> 
> On  7.09.2018 10:39, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > The Btrfs swap code is going to need it, so give it a btrfs_ prefix and
> > make it non-static.
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> One minor nit but otherwise:
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> 
> > ---
> >  fs/btrfs/volumes.c | 22 +++++++++++-----------
> >  fs/btrfs/volumes.h |  9 +++++++++
> >  2 files changed, 20 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 207e36b70d9b..514932c47bcd 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2714,8 +2714,8 @@ static int btrfs_del_sys_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >  	return ret;
> >  }
> >  
> > -static struct extent_map *get_chunk_map(struct btrfs_fs_info *fs_info,
> > -					u64 logical, u64 length)
> > +struct extent_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
> > +				       u64 logical, u64 length)
> >  {
> >  	struct extent_map_tree *em_tree;
> >  	struct extent_map *em;
> > @@ -2752,7 +2752,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> >  	int i, ret = 0;
> >  	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> >  
> > -	em = get_chunk_map(fs_info, chunk_offset, 1);
> > +	em = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
> >  	if (IS_ERR(em)) {
> >  		/*
> >  		 * This is a logic error, but we don't want to just rely on the
> > @@ -4897,7 +4897,7 @@ int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
> >  	int i = 0;
> >  	int ret = 0;
> >  
> > -	em = get_chunk_map(fs_info, chunk_offset, chunk_size);
> > +	em = btrfs_get_chunk_map(fs_info, chunk_offset, chunk_size);
> >  	if (IS_ERR(em))
> >  		return PTR_ERR(em);
> >  
> > @@ -5039,7 +5039,7 @@ int btrfs_chunk_readonly(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >  	int miss_ndevs = 0;
> >  	int i;
> >  
> > -	em = get_chunk_map(fs_info, chunk_offset, 1);
> > +	em = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
> >  	if (IS_ERR(em))
> >  		return 1;
> >  
> > @@ -5099,7 +5099,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
> >  	struct map_lookup *map;
> >  	int ret;
> >  
> > -	em = get_chunk_map(fs_info, logical, len);
> > +	em = btrfs_get_chunk_map(fs_info, logical, len);
> >  	if (IS_ERR(em))
> >  		/*
> >  		 * We could return errors for these cases, but that could get
> > @@ -5145,7 +5145,7 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
> >  	struct map_lookup *map;
> >  	unsigned long len = fs_info->sectorsize;
> >  
> > -	em = get_chunk_map(fs_info, logical, len);
> > +	em = btrfs_get_chunk_map(fs_info, logical, len);
> >  
> >  	if (!WARN_ON(IS_ERR(em))) {
> >  		map = em->map_lookup;
> > @@ -5162,7 +5162,7 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
> >  	struct map_lookup *map;
> >  	int ret = 0;
> >  
> > -	em = get_chunk_map(fs_info, logical, len);
> > +	em = btrfs_get_chunk_map(fs_info, logical, len);
> >  
> >  	if(!WARN_ON(IS_ERR(em))) {
> >  		map = em->map_lookup;
> > @@ -5321,7 +5321,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info,
> >  	/* discard always return a bbio */
> >  	ASSERT(bbio_ret);
> >  
> > -	em = get_chunk_map(fs_info, logical, length);
> > +	em = btrfs_get_chunk_map(fs_info, logical, length);
> >  	if (IS_ERR(em))
> >  		return PTR_ERR(em);
> >  
> > @@ -5647,7 +5647,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
> >  		return __btrfs_map_block_for_discard(fs_info, logical,
> >  						     *length, bbio_ret);
> >  
> > -	em = get_chunk_map(fs_info, logical, *length);
> > +	em = btrfs_get_chunk_map(fs_info, logical, *length);
> >  	if (IS_ERR(em))
> >  		return PTR_ERR(em);
> >  
> > @@ -5946,7 +5946,7 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start,
> >  	u64 rmap_len;
> >  	int i, j, nr = 0;
> >  
> > -	em = get_chunk_map(fs_info, chunk_start, 1);
> > +	em = btrfs_get_chunk_map(fs_info, chunk_start, 1);
> >  	if (IS_ERR(em))
> >  		return -EIO;
> >  
> > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> > index 23e9285d88de..d68c8a05a774 100644
> > --- a/fs/btrfs/volumes.h
> > +++ b/fs/btrfs/volumes.h
> > @@ -465,6 +465,15 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
> >  int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans,
> >  			     u64 chunk_offset, u64 chunk_size);
> >  int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
> > +/**
> > + * btrfs_get_chunk_map() - Find the mapping containing the given logical extent.
> > + * @logical: Logical block offset in bytes.
> > + * @length: Length of extent in bytes.
> > + *
> > + * Return: Chunk mapping or ERR_PTR.
> > + */
> 
> nit: I think the accepted style is to have the documentation before the
> definition and not declaration of the function. I guess David can fix
> that when the patch is being merged.

Yes the comment should be at the function body, I can fix that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 6/6] Btrfs: support swap files
  2018-09-07  8:39   ` Nikolay Borisov
@ 2018-09-07 18:04     ` Omar Sandoval
  2018-09-07 18:17       ` Omar Sandoval
  0 siblings, 1 reply; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07 18:04 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, kernel-team, linux-mm

On Fri, Sep 07, 2018 at 11:39:25AM +0300, Nikolay Borisov wrote:
> 
> 
> On  7.09.2018 10:39, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Implement the swap file a_ops on Btrfs. Activation needs to make sure
> > that the file can be used as a swap file, which currently means it must
> > be fully allocated as nocow with no compression on one device. It must
> > also do the proper tracking so that ioctls will not interfere with the
> > swap file. Deactivation clears this tracking.
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  fs/btrfs/inode.c | 316 +++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 316 insertions(+)
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 9357a19d2bff..55aba2d7074c 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/uio.h>
> >  #include <linux/magic.h>
> >  #include <linux/iversion.h>
> > +#include <linux/swap.h>
> >  #include <asm/unaligned.h>
> >  #include "ctree.h"
> >  #include "disk-io.h"
> > @@ -10437,6 +10438,319 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end)
> >  	}
> >  }
> >  
> > +/*
> > + * Add an entry indicating a block group or device which is pinned by a
> > + * swapfile. Returns 0 on success, 1 if there is already an entry for it, or a
> > + * negative errno on failure.
> > + */
> > +static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr,
> > +				  bool is_block_group)
> > +{
> > +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> > +	struct btrfs_swapfile_pin *sp, *entry;
> > +	struct rb_node **p;
> > +	struct rb_node *parent = NULL;
> > +
> > +	sp = kmalloc(sizeof(*sp), GFP_NOFS);
> > +	if (!sp)
> > +		return -ENOMEM;
> > +	sp->ptr = ptr;
> > +	sp->inode = inode;
> > +	sp->is_block_group = is_block_group;
> > +
> > +	spin_lock(&fs_info->swapfile_pins_lock);
> > +	p = &fs_info->swapfile_pins.rb_node;
> > +	while (*p) {
> > +		parent = *p;
> > +		entry = rb_entry(parent, struct btrfs_swapfile_pin, node);
> > +		if (sp->ptr < entry->ptr ||
> > +		    (sp->ptr == entry->ptr && sp->inode < entry->inode)) {
> > +			p = &(*p)->rb_left;
> > +		} else if (sp->ptr > entry->ptr ||
> > +			   (sp->ptr == entry->ptr && sp->inode > entry->inode)) {
> > +			p = &(*p)->rb_right;
> > +		} else {
> > +			spin_unlock(&fs_info->swapfile_pins_lock);
> > +			kfree(sp);
> > +			return 1;
> > +		}
> 
> 
> I have to admit this is creative use of pointers but I dislike it:
> 
> 1. You are not really doing an interval tree of any sorts so rb seems a
> bit of an overkill. How many block groups/devices do you expect to have
> in the rb tree i.e how many swap files per file system so that the logn
> search behavior really matter? Why not a simple linked list and just an
> equality comparison of pointers?

We know there's at least one block group per gigabyte of swap, but there
can be many more if the file is fragmented. We could probably get away
with a linked list in most cases, but the number of entries is only
bounded by the size of the filesystem * number of swapfiles. With a
linked list, checking n block groups for balance becomes O(n^2) instead
of O(n * log n), so I'd rather be on the safe side here. The rbtree
manipulation isn't that much more complicated than using a linked list,
after all.

> 2. The code self-admits that using pointers for lt/gr comparison is a
> hack since in case pointers match you fall back to checking the inode
> pointer

It's not a fallback. A block group or device can contain more than one
swapfile, so it's really a separate entry.

> 3. There is a discrepancy between the keys used for adding (ptr + inode)
> and deletion (just inode)

Well, yeah, we add an entry that says block group/device X is pinned by
inode Y, and we delete all of the entries pinned by inode Y.

> At the very  least this hack needs to be at least mentioned in the
> changelog.

I'll add these comments:

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e37ce40db380..1c258ee4be24 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -719,6 +719,11 @@ struct btrfs_delayed_root;
 /*
  * Block group or device which contains an active swapfile. Used for preventing
  * unsafe operations while a swapfile is active.
+ *
+ * These are sorted on (ptr, inode) (note that a block group or device can
+ * contain more than one swapfile). We compare the pointer values because we
+ * don't actually care what the object is, we just need a quick check whether
+ * the object exists in the rbtree.
  */
 struct btrfs_swapfile_pin {
 	struct rb_node node;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 55aba2d7074c..e103e81c6533 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10481,6 +10481,7 @@ static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr,
 	return 0;
 }
 
+/* Free all of the entries pinned by this swapfile. */
 static void btrfs_free_swapfile_pins(struct inode *inode)
 {
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 514932c47bcd..062ad86358ad 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7545,6 +7545,10 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 	return ret;
 }
 
+/*
+ * Check whether the given block group or device is pinned by any inode being
+ * used as a swapfile.
+ */
 bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr)
 {
 	struct btrfs_swapfile_pin *sp;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 6/6] Btrfs: support swap files
  2018-09-07 18:04     ` Omar Sandoval
@ 2018-09-07 18:17       ` Omar Sandoval
  0 siblings, 0 replies; 12+ messages in thread
From: Omar Sandoval @ 2018-09-07 18:17 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, kernel-team, linux-mm

On Fri, Sep 07, 2018 at 11:04:28AM -0700, Omar Sandoval wrote:
> On Fri, Sep 07, 2018 at 11:39:25AM +0300, Nikolay Borisov wrote:
> > 
> > 
> > On  7.09.2018 10:39, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@fb.com>
> > > 
> > > Implement the swap file a_ops on Btrfs. Activation needs to make sure
> > > that the file can be used as a swap file, which currently means it must
> > > be fully allocated as nocow with no compression on one device. It must
> > > also do the proper tracking so that ioctls will not interfere with the
> > > swap file. Deactivation clears this tracking.
> > > 
> > > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > > ---
> > >  fs/btrfs/inode.c | 316 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 316 insertions(+)
> > > 
> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > > index 9357a19d2bff..55aba2d7074c 100644
> > > --- a/fs/btrfs/inode.c
> > > +++ b/fs/btrfs/inode.c
> > > @@ -27,6 +27,7 @@
> > >  #include <linux/uio.h>
> > >  #include <linux/magic.h>
> > >  #include <linux/iversion.h>
> > > +#include <linux/swap.h>
> > >  #include <asm/unaligned.h>
> > >  #include "ctree.h"
> > >  #include "disk-io.h"
> > > @@ -10437,6 +10438,319 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end)
> > >  	}
> > >  }
> > >  
> > > +/*
> > > + * Add an entry indicating a block group or device which is pinned by a
> > > + * swapfile. Returns 0 on success, 1 if there is already an entry for it, or a
> > > + * negative errno on failure.
> > > + */
> > > +static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr,
> > > +				  bool is_block_group)
> > > +{
> > > +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> > > +	struct btrfs_swapfile_pin *sp, *entry;
> > > +	struct rb_node **p;
> > > +	struct rb_node *parent = NULL;
> > > +
> > > +	sp = kmalloc(sizeof(*sp), GFP_NOFS);
> > > +	if (!sp)
> > > +		return -ENOMEM;
> > > +	sp->ptr = ptr;
> > > +	sp->inode = inode;
> > > +	sp->is_block_group = is_block_group;
> > > +
> > > +	spin_lock(&fs_info->swapfile_pins_lock);
> > > +	p = &fs_info->swapfile_pins.rb_node;
> > > +	while (*p) {
> > > +		parent = *p;
> > > +		entry = rb_entry(parent, struct btrfs_swapfile_pin, node);
> > > +		if (sp->ptr < entry->ptr ||
> > > +		    (sp->ptr == entry->ptr && sp->inode < entry->inode)) {
> > > +			p = &(*p)->rb_left;
> > > +		} else if (sp->ptr > entry->ptr ||
> > > +			   (sp->ptr == entry->ptr && sp->inode > entry->inode)) {
> > > +			p = &(*p)->rb_right;
> > > +		} else {
> > > +			spin_unlock(&fs_info->swapfile_pins_lock);
> > > +			kfree(sp);
> > > +			return 1;
> > > +		}
> > 
> > 
> > I have to admit this is creative use of pointers but I dislike it:
> > 
> > 1. You are not really doing an interval tree of any sorts so rb seems a
> > bit of an overkill. How many block groups/devices do you expect to have
> > in the rb tree i.e how many swap files per file system so that the logn
> > search behavior really matter? Why not a simple linked list and just an
> > equality comparison of pointers?
> 
> We know there's at least one block group per gigabyte of swap, but there
> can be many more if the file is fragmented. We could probably get away
> with a linked list in most cases, but the number of entries is only
> bounded by the size of the filesystem * number of swapfiles. With a
> linked list, checking n block groups for balance becomes O(n^2) instead
> of O(n * log n), so I'd rather be on the safe side here. The rbtree
> manipulation isn't that much more complicated than using a linked list,
> after all.
> 
> > 2. The code self-admits that using pointers for lt/gr comparison is a
> > hack since in case pointers match you fall back to checking the inode
> > pointer
> 
> It's not a fallback. A block group or device can contain more than one
> swapfile, so it's really a separate entry.
> 
> > 3. There is a discrepancy between the keys used for adding (ptr + inode)
> > and deletion (just inode)
> 
> Well, yeah, we add an entry that says block group/device X is pinned by
> inode Y, and we delete all of the entries pinned by inode Y.
> 
> > At the very  least this hack needs to be at least mentioned in the
> > changelog.
> 
> I'll add these comments:
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index e37ce40db380..1c258ee4be24 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -719,6 +719,11 @@ struct btrfs_delayed_root;
>  /*
>   * Block group or device which contains an active swapfile. Used for preventing
>   * unsafe operations while a swapfile is active.
> + *
> + * These are sorted on (ptr, inode) (note that a block group or device can
> + * contain more than one swapfile). We compare the pointer values because we
> + * don't actually care what the object is, we just need a quick check whether
> + * the object exists in the rbtree.
>   */
>  struct btrfs_swapfile_pin {
>  	struct rb_node node;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 55aba2d7074c..e103e81c6533 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -10481,6 +10481,7 @@ static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr,
>  	return 0;
>  }
>  
> +/* Free all of the entries pinned by this swapfile. */
>  static void btrfs_free_swapfile_pins(struct inode *inode)
>  {
>  	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 514932c47bcd..062ad86358ad 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7545,6 +7545,10 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  	return ret;
>  }
>  
> +/*
> + * Check whether the given block group or device is pinned by any inode being
> + * used as a swapfile.
> + */
>  bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr)
>  {
>  	struct btrfs_swapfile_pin *sp;

Pushed to https://github.com/osandov/linux/tree/btrfs-swap, along with
moving the docstring in patch 5.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-07 18:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-07  7:39 [PATCH v6 0/6] Btrfs: implement swap file support Omar Sandoval
2018-09-07  7:39 ` [PATCH v6 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS Omar Sandoval
2018-09-07  7:39 ` [PATCH v6 2/6] mm: export add_swap_extent() Omar Sandoval
2018-09-07  7:39 ` [PATCH v6 3/6] vfs: update swap_{,de}activate documentation Omar Sandoval
2018-09-07  7:39 ` [PATCH v6 4/6] Btrfs: prevent ioctls from interfering with a swap file Omar Sandoval
2018-09-07  7:39 ` [PATCH v6 5/6] Btrfs: rename get_chunk_map() and make it non-static Omar Sandoval
2018-09-07  7:54   ` Nikolay Borisov
2018-09-07 10:26     ` David Sterba
2018-09-07  7:39 ` [PATCH v6 6/6] Btrfs: support swap files Omar Sandoval
2018-09-07  8:39   ` Nikolay Borisov
2018-09-07 18:04     ` Omar Sandoval
2018-09-07 18:17       ` Omar Sandoval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).