* [RFC PATCH v3 1/7] direct-io: don't dirty ITER_BVEC pages on read
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
@ 2014-12-10 1:45 ` Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 2/7] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
` (6 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2014-12-10 1:45 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
Cc: Omar Sandoval, Ming Lei
Reads through the iov_iter infrastructure for kernel pages shouldn't be
dirtied by the direct I/O code.
This is based on Dave Kleikamp's and Ming Lei's previously posted
patches.
Cc: Ming Lei <ming.lei@canonical.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/direct-io.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b..e542ce4 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -120,6 +120,7 @@ struct dio {
spinlock_t bio_lock; /* protects BIO fields below */
int page_errors; /* errno from get_user_pages() */
int is_async; /* is IO async ? */
+ int should_dirty; /* should we mark read pages dirty? */
bool defer_completion; /* defer AIO completion to workqueue? */
int io_error; /* IO error in completion path */
unsigned long refcount; /* direct_io_worker() and bios */
@@ -392,7 +393,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
dio->refcount++;
spin_unlock_irqrestore(&dio->bio_lock, flags);
- if (dio->is_async && dio->rw == READ)
+ if (dio->is_async && dio->rw == READ && dio->should_dirty)
bio_set_pages_dirty(bio);
if (sdio->submit_io)
@@ -463,13 +464,13 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
if (!uptodate)
dio->io_error = -EIO;
- if (dio->is_async && dio->rw == READ) {
+ if (dio->is_async && dio->rw == READ && dio->should_dirty) {
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
bio_for_each_segment_all(bvec, bio, i) {
struct page *page = bvec->bv_page;
- if (dio->rw == READ && !PageCompound(page))
+ if (dio->rw == READ && !PageCompound(page) && dio->should_dirty)
set_page_dirty_lock(page);
page_cache_release(page);
}
@@ -1177,6 +1178,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
dio->inode = inode;
dio->rw = rw;
+ dio->should_dirty = !(iter->type & ITER_BVEC);
/*
* For AIO O_(D)SYNC writes we need to defer completions to a workqueue
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 2/7] nfs: don't dirty ITER_BVEC pages read through direct I/O
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 1/7] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
@ 2014-12-10 1:45 ` Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 3/7] swap: use direct I/O for SWP_FILE swap_readpage Omar Sandoval
` (5 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2014-12-10 1:45 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
Cc: Omar Sandoval
As with the generic blockdev code, kernel pages shouldn't be dirtied by
the direct I/O path.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/nfs/direct.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 10bf072..a67fa2c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -88,6 +88,7 @@ struct nfs_direct_req {
struct pnfs_ds_commit_info ds_cinfo; /* Storage for cinfo */
struct work_struct work;
int flags;
+ int should_dirty; /* should we mark read pages dirty? */
#define NFS_ODIRECT_DO_COMMIT (1) /* an unstable reply was received */
#define NFS_ODIRECT_RESCHED_WRITES (2) /* write verification failed */
struct nfs_writeverf verf; /* unstable write verifier */
@@ -370,7 +371,8 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
struct nfs_page *req = nfs_list_entry(hdr->pages.next);
struct page *page = req->wb_page;
- if (!PageCompound(page) && bytes < hdr->good_bytes)
+ if (!PageCompound(page) && bytes < hdr->good_bytes &&
+ dreq->should_dirty)
set_page_dirty(page);
bytes += req->wb_bytes;
nfs_list_remove_request(req);
@@ -542,6 +544,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
dreq->inode = inode;
dreq->bytes_left = count;
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
+ dreq->should_dirty = !(iter->type & ITER_BVEC);
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
result = PTR_ERR(l_ctx);
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 3/7] swap: use direct I/O for SWP_FILE swap_readpage
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 1/7] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 2/7] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
@ 2014-12-10 1:45 ` Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 4/7] vfs: update swap_{,de}activate documentation Omar Sandoval
` (4 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2014-12-10 1:45 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
Cc: Omar Sandoval, Mel Gorman
On Mon, Nov 17, 2014 at 07:48:17AM -0800, Christoph Hellwig wrote:
> With the new iov_iter infrastructure that supprots direct I/O to kernel
> pages please get rid of the ->readpage hack first. I'm still utterly
> disapoined that this crap ever got merged.
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
mm/page_io.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index 955db8b..10715e0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -283,8 +283,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
set_page_writeback(page);
unlock_page(page);
- ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE,
- &kiocb, &from,
+ ret = mapping->a_ops->direct_IO(WRITE, &kiocb, &from,
kiocb.ki_pos);
if (ret == PAGE_SIZE) {
count_vm_event(PSWPOUT);
@@ -348,12 +347,37 @@ int swap_readpage(struct page *page)
}
if (sis->flags & SWP_FILE) {
+ struct kiocb kiocb;
struct file *swap_file = sis->swap_file;
struct address_space *mapping = swap_file->f_mapping;
+ struct bio_vec bv = {
+ .bv_page = page,
+ .bv_len = PAGE_SIZE,
+ .bv_offset = 0,
+ };
+ struct iov_iter to = {
+ .type = ITER_BVEC | READ,
+ .count = PAGE_SIZE,
+ .iov_offset = 0,
+ .nr_segs = 1,
+ };
+ to.bvec = &bv; /* older gcc versions are broken */
+
+ init_sync_kiocb(&kiocb, swap_file);
+ kiocb.ki_pos = page_file_offset(page);
+ kiocb.ki_nbytes = PAGE_SIZE;
- ret = mapping->a_ops->readpage(swap_file, page);
- if (!ret)
+ ret = mapping->a_ops->direct_IO(READ, &kiocb, &to,
+ kiocb.ki_pos);
+ if (ret == PAGE_SIZE) {
+ SetPageUptodate(page);
count_vm_event(PSWPIN);
+ ret = 0;
+ } else {
+ ClearPageUptodate(page);
+ SetPageError(page);
+ }
+ unlock_page(page);
return ret;
}
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 4/7] vfs: update swap_{,de}activate documentation
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
` (2 preceding siblings ...)
2014-12-10 1:45 ` [RFC PATCH v3 3/7] swap: use direct I/O for SWP_FILE swap_readpage Omar Sandoval
@ 2014-12-10 1:45 ` Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 5/7] btrfs: prevent ioctls from interfering with a swap file Omar Sandoval
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2014-12-10 1:45 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
Cc: Omar Sandoval
Parameters were added to swap_activate in the same patch series that
introduced it without updating the documentation. Additionally, the
documentation claims that non-existent address space operations
swap_{in,out} are used for swap I/O, but it's (now) direct_IO.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
Documentation/filesystems/Locking | 7 ++++---
Documentation/filesystems/vfs.txt | 7 ++++---
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index b30753c..e72b4c3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -205,7 +205,8 @@ prototypes:
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
int (*error_remove_page)(struct address_space *, struct page *);
- int (*swap_activate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *, struct file *,
+ sector_t *);
int (*swap_deactivate)(struct file *);
locking rules:
@@ -230,8 +231,8 @@ migratepage: yes (both)
launder_page: yes
is_partially_uptodate: yes
error_remove_page: yes
-swap_activate: no
-swap_deactivate: no
+swap_activate: yes
+swap_deactivate: no
->write_begin(), ->write_end(), ->sync_page() and ->readpage()
may be called from the request handler (/dev/loop).
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 20bf204..46ef103 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -600,8 +600,9 @@ struct address_space_operations {
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
- int (*swap_activate)(struct file *);
- int (*swap_deactivate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *, struct file *,
+ sector_t *);
+ void (*swap_deactivate)(struct file *);
};
writepage: called by the VM to write a dirty page to backing store.
@@ -788,7 +789,7 @@ struct address_space_operations {
memory. A return value of zero indicates success,
in which case this file can be used to back swapspace. The
swapspace operations will be proxied to this address space's
- ->swap_{out,in} methods.
+ ->direct_IO method for both reads and writes.
swap_deactivate: Called during swapoff on files where swap_activate
was successful.
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 5/7] btrfs: prevent ioctls from interfering with a swap file
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
` (3 preceding siblings ...)
2014-12-10 1:45 ` [RFC PATCH v3 4/7] vfs: update swap_{,de}activate documentation Omar Sandoval
@ 2014-12-10 1:45 ` Omar Sandoval
2014-12-10 1:45 ` [RFC PATCH v3 6/7] btrfs: add EXTENT_FLAG_SWAPFILE Omar Sandoval
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2014-12-10 1:45 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
Cc: Omar Sandoval
There are several ioctls which can work around constraints enforced by
btrfs_swap_activate and lead to an unsafe situation. We cannot do any of
the following on an active swap file in order to avoid creating
compressed or shared extents:
- chattr -C or +c
- snapshot create
- defrag
- clone
- dedup
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/btrfs/ctree.h | 3 +++
fs/btrfs/disk-io.c | 1 +
fs/btrfs/ioctl.c | 35 +++++++++++++++++++++++++++++++----
3 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fe69edd..38979b9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1891,6 +1891,9 @@ struct btrfs_root {
int send_in_progress;
struct btrfs_subvolume_writers *subv_writers;
atomic_t will_be_snapshoted;
+
+ /* Number of active swapfiles */
+ atomic_t nr_swapfiles;
};
struct btrfs_ioctl_defrag_range_args {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1bf9f89..60094c4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1265,6 +1265,7 @@ static void __setup_root(u32 nodesize, u32 sectorsize, u32 stripesize,
atomic_set(&root->orphan_inodes, 0);
atomic_set(&root->refs, 1);
atomic_set(&root->will_be_snapshoted, 0);
+ atomic_set(&root->nr_swapfiles, 0);
root->log_transid = 0;
root->log_transid_committed = -1;
root->last_log_commit = 0;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4399f0c..18fa95c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -291,9 +291,11 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
} else {
ip->flags |= BTRFS_INODE_NODATACOW;
}
- } else {
+ } else if (!IS_SWAPFILE(inode)) {
/*
- * Revert back under same assuptions as above
+ * Revert back under same assumptions as above. swap_activate
+ * checks that we don't swapon a copy-on-write file, but we also
+ * make sure that it doesn't become copy-on-write here.
*/
if (S_ISREG(mode)) {
if (inode->i_size == 0)
@@ -316,7 +318,12 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
ret = btrfs_set_prop(inode, "btrfs.compression", NULL, 0, 0);
if (ret && ret != -ENODATA)
goto out_drop;
- } else if (flags & FS_COMPR_FL) {
+ } else if (flags & FS_COMPR_FL && !IS_SWAPFILE(inode)) {
+ /*
+ * Like nodatacow, swap_activate checks that we don't swapon a
+ * compressed file, so we shouldn't let it become compressed.
+ */
+
const char *comp;
ip->flags |= BTRFS_INODE_COMPRESS;
@@ -330,7 +337,6 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
comp, strlen(comp), 0);
if (ret)
goto out_drop;
-
} else {
ret = btrfs_set_prop(inode, "btrfs.compression", NULL, 0, 0);
if (ret && ret != -ENODATA)
@@ -647,6 +653,12 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
return -EINVAL;
+ if (atomic_read(&root->nr_swapfiles)) {
+ btrfs_err(root->fs_info,
+ "cannot create snapshot with active swapfile");
+ return -ETXTBSY;
+ }
+
atomic_inc(&root->will_be_snapshoted);
smp_mb__after_atomic();
btrfs_wait_nocow_write(root);
@@ -1292,6 +1304,12 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
compress_type = range->compress_type;
}
+ mutex_lock(&inode->i_mutex);
+ ret = IS_SWAPFILE(inode) ? -ETXTBSY : 0;
+ mutex_unlock(&inode->i_mutex);
+ if (ret)
+ return ret;
+
if (extent_thresh == 0)
extent_thresh = 256 * 1024;
@@ -2927,6 +2945,11 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len,
btrfs_double_lock(src, loff, dst, dst_loff, len);
+ if (IS_SWAPFILE(src) || IS_SWAPFILE(dst)) {
+ ret = -ETXTBSY;
+ goto out_unlock;
+ }
+
ret = extent_same_check_offsets(src, loff, len);
if (ret)
goto out_unlock;
@@ -3644,6 +3667,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
mutex_lock(&src->i_mutex);
}
+ ret = -ETXTBSY;
+ if (IS_SWAPFILE(src) || IS_SWAPFILE(inode))
+ goto out_unlock;
+
/* determine range to clone */
ret = -EINVAL;
if (off + len > src->i_size || off + len < off)
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 6/7] btrfs: add EXTENT_FLAG_SWAPFILE
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
` (4 preceding siblings ...)
2014-12-10 1:45 ` [RFC PATCH v3 5/7] btrfs: prevent ioctls from interfering with a swap file Omar Sandoval
@ 2014-12-10 1:45 ` Omar Sandoval
2014-12-12 10:32 ` David Sterba
2014-12-10 1:45 ` [RFC PATCH v3 7/7] btrfs: enable swap file support Omar Sandoval
2014-12-12 10:32 ` [RFC PATCH v3 0/7] btrfs: implement " David Sterba
7 siblings, 1 reply; 13+ messages in thread
From: Omar Sandoval @ 2014-12-10 1:45 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
Cc: Omar Sandoval
Extents mapping a swap file should remain pinned in memory in order to
avoid doing allocations to look up an extent when we're already low on
memory. Rather than overloading EXTENT_FLAG_PINNED, add a new flag
specifically for this purpose.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/btrfs/extent_io.c | 1 +
fs/btrfs/extent_map.h | 1 +
fs/btrfs/inode.c | 1 +
include/trace/events/btrfs.h | 3 ++-
4 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bf3f424..36166d0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4244,6 +4244,7 @@ int try_release_extent_mapping(struct extent_map_tree *map,
break;
}
if (test_bit(EXTENT_FLAG_PINNED, &em->flags) ||
+ test_bit(EXTENT_FLAG_SWAPFILE, &em->flags) ||
em->start != start) {
write_unlock(&map->lock);
free_extent_map(em);
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index b2991fd..93b9548 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -16,6 +16,7 @@
#define EXTENT_FLAG_LOGGING 4 /* Logging this extent */
#define EXTENT_FLAG_FILLING 5 /* Filling in a preallocated extent */
#define EXTENT_FLAG_FS_MAPPING 6 /* filesystem extent mapping type */
+#define EXTENT_FLAG_SWAPFILE 7 /* this extent maps a swap file */
struct extent_map {
struct rb_node rb_node;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d23362f..7c2dfb2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6353,6 +6353,7 @@ again:
else
goto out;
}
+ WARN_ON_ONCE(IS_SWAPFILE(inode));
em = alloc_extent_map();
if (!em) {
err = -ENOMEM;
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 1faecea..5c5f9de 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -164,7 +164,8 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
{ (1 << EXTENT_FLAG_PREALLOC), "PREALLOC" },\
{ (1 << EXTENT_FLAG_LOGGING), "LOGGING" },\
{ (1 << EXTENT_FLAG_FILLING), "FILLING" },\
- { (1 << EXTENT_FLAG_FS_MAPPING), "FS_MAPPING" })
+ { (1 << EXTENT_FLAG_FS_MAPPING), "FS_MAPPING" },\
+ { (1 << EXTENT_FLAG_SWAPFILE), "SWAPFILE" })
TRACE_EVENT_CONDITION(btrfs_get_extent,
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v3 6/7] btrfs: add EXTENT_FLAG_SWAPFILE
2014-12-10 1:45 ` [RFC PATCH v3 6/7] btrfs: add EXTENT_FLAG_SWAPFILE Omar Sandoval
@ 2014-12-12 10:32 ` David Sterba
0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2014-12-12 10:32 UTC (permalink / raw)
To: Omar Sandoval
Cc: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
On Tue, Dec 09, 2014 at 05:45:47PM -0800, Omar Sandoval wrote:
> Extents mapping a swap file should remain pinned in memory in order to
> avoid doing allocations to look up an extent when we're already low on
> memory. Rather than overloading EXTENT_FLAG_PINNED, add a new flag
> specifically for this purpose.
>
> Signed-off-by: Omar Sandoval <osandov@osandov.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 7/7] btrfs: enable swap file support
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
` (5 preceding siblings ...)
2014-12-10 1:45 ` [RFC PATCH v3 6/7] btrfs: add EXTENT_FLAG_SWAPFILE Omar Sandoval
@ 2014-12-10 1:45 ` Omar Sandoval
2014-12-12 10:51 ` David Sterba
2014-12-12 10:32 ` [RFC PATCH v3 0/7] btrfs: implement " David Sterba
7 siblings, 1 reply; 13+ messages in thread
From: Omar Sandoval @ 2014-12-10 1:45 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
Cc: Omar Sandoval
Implement the swap file a_ops on btrfs. Activation does two things:
1. Checks for a usable swap file: it must be fully allocated (no holes),
support direct I/O (so no compressed or inline extents) and must be
eligible for nocow in its entirety in order to avoid doing a bunch of
allocations for a COW when we're already low on memory
2. Pins the extent maps in memory with EXTENT_FLAG_SWAPFILE
Deactivation unpins all of the extent maps.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/btrfs/inode.c | 131 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 131 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7c2dfb2..76b58d7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7191,6 +7191,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
* this will cow the extent, reset the len in case we changed
* it above
*/
+ WARN_ON_ONCE(IS_SWAPFILE(inode));
len = bh_result->b_size;
free_extent_map(em);
em = btrfs_new_extent_direct(inode, start, len);
@@ -9443,6 +9444,134 @@ out_inode:
}
+static void __clear_swapfile_extents(struct inode *inode)
+{
+ u64 isize = inode->i_size;
+ struct extent_map *em;
+ u64 start, len;
+
+ start = 0;
+ while (start < isize) {
+ len = isize - start;
+ em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
+ if (IS_ERR(em))
+ return;
+
+ clear_bit(EXTENT_FLAG_SWAPFILE, &em->flags);
+
+ start = extent_map_end(em);
+ free_extent_map(em);
+ }
+}
+
+static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
+ sector_t *span)
+{
+ struct inode *inode = file_inode(file);
+ struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+ struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+ int ret = 0;
+ u64 isize = inode->i_size;
+ struct extent_state *cached_state = NULL;
+ struct extent_map *em;
+ u64 start, len;
+
+ if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) {
+ /* Can't do direct I/O on a compressed file. */
+ btrfs_err(fs_info, "swapfile is compressed");
+ return -EINVAL;
+ }
+ if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) {
+ /*
+ * Going through the copy-on-write path while swapping pages
+ * in/out and doing a bunch of allocations could stress the
+ * memory management code that got us there in the first place,
+ * and that's sure to be a bad time.
+ */
+ btrfs_err(fs_info, "swapfile is copy-on-write");
+ return -EINVAL;
+ }
+
+ lock_extent_bits(io_tree, 0, isize - 1, 0, &cached_state);
+
+ /*
+ * All of the extents must be allocated and support direct I/O. Inline
+ * extents and compressed extents fall back to buffered I/O, so those
+ * are no good. Additionally, all of the extents must be safe for nocow.
+ */
+ atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles);
+ start = 0;
+ while (start < isize) {
+ len = isize - start;
+ em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
+ if (IS_ERR(em)) {
+ ret = PTR_ERR(em);
+ goto out;
+ }
+
+ if (test_bit(EXTENT_FLAG_VACANCY, &em->flags) ||
+ em->block_start == EXTENT_MAP_HOLE) {
+ btrfs_err(fs_info, "swapfile has holes");
+ ret = -EINVAL;
+ goto out;
+ }
+ if (em->block_start == EXTENT_MAP_INLINE) {
+ /*
+ * It's unlikely we'll ever actually find ourselves
+ * here, as a file small enough to fit inline won't be
+ * big enough to store more than the swap header, but in
+ * case something changes in the future, let's catch it
+ * here rather than later.
+ */
+ btrfs_err(fs_info, "swapfile is inline");
+ ret = -EINVAL;
+ goto out;
+ }
+ if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
+ btrfs_err(fs_info, "swapfile is compresed");
+ ret = -EINVAL;
+ goto out;
+ }
+ ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL);
+ if (ret < 0) {
+ goto out;
+ } else if (ret == 1) {
+ ret = 0;
+ } else {
+ btrfs_err(fs_info, "swapfile has extent requiring COW (%llu-%llu)",
+ start, start + len - 1);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ set_bit(EXTENT_FLAG_SWAPFILE, &em->flags);
+
+ start = extent_map_end(em);
+ free_extent_map(em);
+ }
+
+out:
+ if (ret) {
+ __clear_swapfile_extents(inode);
+ atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
+ }
+ unlock_extent_cached(io_tree, 0, isize - 1, &cached_state, GFP_NOFS);
+ return ret;
+}
+
+static void btrfs_swap_deactivate(struct file *file)
+{
+ struct inode *inode = file_inode(file);
+ struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+ struct extent_state *cached_state = NULL;
+ u64 isize = inode->i_size;
+
+ lock_extent_bits(io_tree, 0, isize - 1, 0, &cached_state);
+ __clear_swapfile_extents(inode);
+ unlock_extent_cached(io_tree, 0, isize - 1, &cached_state, GFP_NOFS);
+ atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
+}
+
static const struct inode_operations btrfs_dir_inode_operations = {
.getattr = btrfs_getattr,
.lookup = btrfs_lookup,
@@ -9520,6 +9649,8 @@ static const struct address_space_operations btrfs_aops = {
.releasepage = btrfs_releasepage,
.set_page_dirty = btrfs_set_page_dirty,
.error_remove_page = generic_error_remove_page,
+ .swap_activate = btrfs_swap_activate,
+ .swap_deactivate = btrfs_swap_deactivate,
};
static const struct address_space_operations btrfs_symlink_aops = {
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v3 7/7] btrfs: enable swap file support
2014-12-10 1:45 ` [RFC PATCH v3 7/7] btrfs: enable swap file support Omar Sandoval
@ 2014-12-12 10:51 ` David Sterba
2014-12-12 20:00 ` Omar Sandoval
0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2014-12-12 10:51 UTC (permalink / raw)
To: Omar Sandoval
Cc: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
On Tue, Dec 09, 2014 at 05:45:48PM -0800, Omar Sandoval wrote:
> +static void __clear_swapfile_extents(struct inode *inode)
> +{
> + u64 isize = inode->i_size;
> + struct extent_map *em;
> + u64 start, len;
> +
> + start = 0;
> + while (start < isize) {
> + len = isize - start;
> + em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
> + if (IS_ERR(em))
> + return;
This could transiently fail if there's no memory to allocate the em, and
would leak the following extents.
> +
> + clear_bit(EXTENT_FLAG_SWAPFILE, &em->flags);
> +
> + start = extent_map_end(em);
> + free_extent_map(em);
> + }
> +}
> +
> +static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
> + sector_t *span)
> +{
> + struct inode *inode = file_inode(file);
> + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> + struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> + int ret = 0;
> + u64 isize = inode->i_size;
> + struct extent_state *cached_state = NULL;
> + struct extent_map *em;
> + u64 start, len;
> +
> + if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) {
> + /* Can't do direct I/O on a compressed file. */
> + btrfs_err(fs_info, "swapfile is compressed");
> + return -EINVAL;
> + }
> + if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) {
> + /*
> + * Going through the copy-on-write path while swapping pages
> + * in/out and doing a bunch of allocations could stress the
> + * memory management code that got us there in the first place,
> + * and that's sure to be a bad time.
> + */
> + btrfs_err(fs_info, "swapfile is copy-on-write");
> + return -EINVAL;
> + }
> +
> + lock_extent_bits(io_tree, 0, isize - 1, 0, &cached_state);
> +
> + /*
> + * All of the extents must be allocated and support direct I/O. Inline
> + * extents and compressed extents fall back to buffered I/O, so those
> + * are no good. Additionally, all of the extents must be safe for nocow.
> + */
> + atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles);
> + start = 0;
> + while (start < isize) {
> + len = isize - start;
> + em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
> + if (IS_ERR(em)) {
IS_ERR_OR_NULL(em)
>From now on the em is valid and has to be free_extent_map()ed ...
> + ret = PTR_ERR(em);
> + goto out;
> + }
> +
> + if (test_bit(EXTENT_FLAG_VACANCY, &em->flags) ||
> + em->block_start == EXTENT_MAP_HOLE) {
> + btrfs_err(fs_info, "swapfile has holes");
> + ret = -EINVAL;
... and all the error branches would miss it.
> + goto out;
> + }
> + if (em->block_start == EXTENT_MAP_INLINE) {
> + /*
> + * It's unlikely we'll ever actually find ourselves
> + * here, as a file small enough to fit inline won't be
> + * big enough to store more than the swap header, but in
> + * case something changes in the future, let's catch it
> + * here rather than later.
> + */
> + btrfs_err(fs_info, "swapfile is inline");
> + ret = -EINVAL;
here
> + goto out;
> + }
> + if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
> + btrfs_err(fs_info, "swapfile is compresed");
> + ret = -EINVAL;
here
> + goto out;
> + }
> + ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL);
> + if (ret < 0) {
here
> + goto out;
> + } else if (ret == 1) {
> + ret = 0;
> + } else {
> + btrfs_err(fs_info, "swapfile has extent requiring COW (%llu-%llu)",
> + start, start + len - 1);
> + ret = -EINVAL;
here
> + goto out;
> + }
> +
> + set_bit(EXTENT_FLAG_SWAPFILE, &em->flags);
> +
> + start = extent_map_end(em);
> + free_extent_map(em);
> + }
> +
> +out:
> + if (ret) {
should be fixed by:
if (!IS_ERR_OR_NULL(em))
free_extent_map(em);
> + __clear_swapfile_extents(inode);
> + atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
> + }
> + unlock_extent_cached(io_tree, 0, isize - 1, &cached_state, GFP_NOFS);
> + return ret;
> +}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v3 7/7] btrfs: enable swap file support
2014-12-12 10:51 ` David Sterba
@ 2014-12-12 20:00 ` Omar Sandoval
0 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2014-12-12 20:00 UTC (permalink / raw)
To: David Sterba
Cc: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, linux-btrfs, linux-fsdevel,
linux-mm, linux-nfs, linux-kernel
On Fri, Dec 12, 2014 at 11:51:22AM +0100, David Sterba wrote:
> On Tue, Dec 09, 2014 at 05:45:48PM -0800, Omar Sandoval wrote:
> > +static void __clear_swapfile_extents(struct inode *inode)
> > +{
> > + u64 isize = inode->i_size;
> > + struct extent_map *em;
> > + u64 start, len;
> > +
> > + start = 0;
> > + while (start < isize) {
> > + len = isize - start;
> > + em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
> > + if (IS_ERR(em))
> > + return;
>
> This could transiently fail if there's no memory to allocate the em, and
> would leak the following extents.
>
This leak I was aware of, and at the time I didn't see a good way to get
around it. After all, if we can't get the current extent, there's no way
to iterate through the rest of them. Now I see that instead of doing
this at the btrfs_get_extent level, I can just go through all of the
extent_maps in the extent_map_tree.
> > +
> > + clear_bit(EXTENT_FLAG_SWAPFILE, &em->flags);
> > +
> > + start = extent_map_end(em);
> > + free_extent_map(em);
> > + }
> > +}
> > +
> > +static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
> > + sector_t *span)
> > +{
> > + struct inode *inode = file_inode(file);
> > + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> > + struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> > + int ret = 0;
> > + u64 isize = inode->i_size;
> > + struct extent_state *cached_state = NULL;
> > + struct extent_map *em;
> > + u64 start, len;
> > +
> > + if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) {
> > + /* Can't do direct I/O on a compressed file. */
> > + btrfs_err(fs_info, "swapfile is compressed");
> > + return -EINVAL;
> > + }
> > + if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) {
> > + /*
> > + * Going through the copy-on-write path while swapping pages
> > + * in/out and doing a bunch of allocations could stress the
> > + * memory management code that got us there in the first place,
> > + * and that's sure to be a bad time.
> > + */
> > + btrfs_err(fs_info, "swapfile is copy-on-write");
> > + return -EINVAL;
> > + }
> > +
> > + lock_extent_bits(io_tree, 0, isize - 1, 0, &cached_state);
> > +
> > + /*
> > + * All of the extents must be allocated and support direct I/O. Inline
> > + * extents and compressed extents fall back to buffered I/O, so those
> > + * are no good. Additionally, all of the extents must be safe for nocow.
> > + */
> > + atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles);
> > + start = 0;
> > + while (start < isize) {
> > + len = isize - start;
> > + em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
> > + if (IS_ERR(em)) {
>
> IS_ERR_OR_NULL(em)
>
> From now on the em is valid and has to be free_extent_map()ed ...
>
> > + ret = PTR_ERR(em);
> > + goto out;
> > + }
> > +
> > + if (test_bit(EXTENT_FLAG_VACANCY, &em->flags) ||
> > + em->block_start == EXTENT_MAP_HOLE) {
> > + btrfs_err(fs_info, "swapfile has holes");
> > + ret = -EINVAL;
>
> ... and all the error branches would miss it.
>
> > + goto out;
> > + }
> > + if (em->block_start == EXTENT_MAP_INLINE) {
> > + /*
> > + * It's unlikely we'll ever actually find ourselves
> > + * here, as a file small enough to fit inline won't be
> > + * big enough to store more than the swap header, but in
> > + * case something changes in the future, let's catch it
> > + * here rather than later.
> > + */
> > + btrfs_err(fs_info, "swapfile is inline");
> > + ret = -EINVAL;
>
> here
>
> > + goto out;
> > + }
> > + if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
> > + btrfs_err(fs_info, "swapfile is compresed");
> > + ret = -EINVAL;
>
> here
>
> > + goto out;
> > + }
> > + ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL);
> > + if (ret < 0) {
>
> here
>
> > + goto out;
> > + } else if (ret == 1) {
> > + ret = 0;
> > + } else {
> > + btrfs_err(fs_info, "swapfile has extent requiring COW (%llu-%llu)",
> > + start, start + len - 1);
> > + ret = -EINVAL;
>
> here
>
> > + goto out;
> > + }
> > +
> > + set_bit(EXTENT_FLAG_SWAPFILE, &em->flags);
> > +
> > + start = extent_map_end(em);
> > + free_extent_map(em);
> > + }
> > +
> > +out:
> > + if (ret) {
>
> should be fixed by:
>
> if (!IS_ERR_OR_NULL(em))
> free_extent_map(em);
>
> > + __clear_swapfile_extents(inode);
> > + atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles);
> > + }
> > + unlock_extent_cached(io_tree, 0, isize - 1, &cached_state, GFP_NOFS);
> > + return ret;
> > +}
This leak I completely missed. Thanks.
--
Omar
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v3 0/7] btrfs: implement swap file support
2014-12-10 1:45 [RFC PATCH v3 0/7] btrfs: implement swap file support Omar Sandoval
` (6 preceding siblings ...)
2014-12-10 1:45 ` [RFC PATCH v3 7/7] btrfs: enable swap file support Omar Sandoval
@ 2014-12-12 10:32 ` David Sterba
2014-12-12 20:15 ` Omar Sandoval
7 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2014-12-12 10:32 UTC (permalink / raw)
To: Omar Sandoval
Cc: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, David Sterba, linux-btrfs,
linux-fsdevel, linux-mm, linux-nfs, linux-kernel
On Tue, Dec 09, 2014 at 05:45:41PM -0800, Omar Sandoval wrote:
> After some discussion on the mailing list, I decided that for simplicity and
> reliability, it's best to simply disallow COW files and files with shared
> extents (like files with extents shared with a snapshot). From a user's
> perspective, this means that a snapshotted subvolume cannot be used for a swap
> file, but keeping the swap file in a separate subvolume that is never
> snapshotted seems entirely reasonable to me.
Well, there are enough special cases how to do things on btrfs and I'd
like to avoid introducing another one.
> An alternative suggestion was to
> allow swap files to be snapshotted and to do an implied COW on swap file
> activation, which I was ready to implement until I realized that we can't permit
> snapshotting a subvolume with an active swap file, so this creates a surprising
> inconsistency for users (in my opinion).
I still don't see why it's not possible to do the snapshot with an
active swapfile.
> As with before, this functionality is tenuously tested in a virtual machine with
> some artificial workloads, but it "works for me". I'm pretty happy with the
> results on my end, so please comment away.
The non-btrfs changes can go independently and do not have to wait until
we resolve the swap vs snapshot problem.
I did a simple test and it crashed instantly, lockep complains:
memory: 2G
swap file: 1G
kernel: 3.17 + v3
[ 739.790731] Adding 1054716k swap on /mnt/test-swap/mnt/swapfile. Priority:-1 extents:1 across:1054716k
[ 751.848607]
[ 751.851852] =====================================
[ 751.852161] [ BUG: bad unlock balance detected! ]
[ 751.852161] 3.17.0-default+ #199 Not tainted
[ 751.852161] -------------------------------------
[ 751.852161] heavy_swap/4119 is trying to release lock (&sb->s_type->i_mutex_key) at:
[ 751.852161] [<ffffffff81a4f0ce>] mutex_unlock+0xe/0x10
[ 751.852161] but there are no more locks to release!
[ 751.852161]
[ 751.852161] other info that might help us debug this:
[ 751.852161] 1 lock held by heavy_swap/4119:
[ 751.852161] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81043ba9>] __do_page_fault+0x149/0x560
[ 751.852161]
[ 751.852161] stack backtrace:
[ 751.852161] CPU: 1 PID: 4119 Comm: heavy_swap Not tainted 3.17.0-default+ #199
[ 751.852161] Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
[ 751.852161] ffffffff81a4f0ce ffff880075dbb3d8 ffffffff81a4b268 0000000000000001
[ 751.852161] ffff8800775e0000 ffff880075dbb408 ffffffff810b51a9 0000000000000000
[ 751.852161] ffff8800775e0000 00000000ffffffff ffff8800763c9d00 ffff880075dbb4a8
[ 751.852161] Call Trace:
[ 751.852161] [<ffffffff81a4f0ce>] ? mutex_unlock+0xe/0x10
[ 751.852161] [<ffffffff81a4b268>] dump_stack+0x51/0x71
[ 751.852161] [<ffffffff810b51a9>] print_unlock_imbalance_bug+0xf9/0x100
[ 751.852161] [<ffffffff810b8bcf>] lock_release_non_nested+0x2cf/0x3e0
[ 751.852161] [<ffffffff81a541ed>] ? ftrace_call+0x5/0x2f
[ 751.852161] [<ffffffff81a4f0ce>] ? mutex_unlock+0xe/0x10
[ 751.852161] [<ffffffff81a4f0ce>] ? mutex_unlock+0xe/0x10
[ 751.852161] [<ffffffff810b8da9>] lock_release+0xc9/0x240
[ 751.852161] [<ffffffff81a4eef0>] __mutex_unlock_slowpath+0x80/0x190
[ 751.852161] [<ffffffff81a4f0c9>] ? mutex_unlock+0x9/0x10
[ 751.852161] [<ffffffff81a4f0ce>] mutex_unlock+0xe/0x10
[ 751.852161] [<ffffffffa0037c88>] btrfs_direct_IO+0x2b8/0x310 [btrfs]
[ 751.852161] [<ffffffff810acf4d>] ? __wake_up_bit+0xd/0x50
[ 751.852161] [<ffffffff8117e03b>] __swap_writepage+0x10b/0x270
[ 751.852161] [<ffffffff81180723>] ? page_swapcount+0x53/0x70
[ 751.852161] [<ffffffff8117e1d7>] swap_writepage+0x37/0x60
[ 751.852161] [<ffffffff8115a072>] shmem_writepage+0x2a2/0x2e0
[ 751.852161] [<ffffffff811554ae>] shrink_page_list+0x44e/0x9d0
[ 751.852161] [<ffffffff81a51b50>] ? _raw_spin_unlock_irq+0x30/0x40
[ 751.852161] [<ffffffff811560cd>] shrink_inactive_list+0x26d/0x4f0
[ 751.852161] [<ffffffff813adcc9>] ? blk_start_plug+0x9/0x50
[ 751.852161] [<ffffffff81156918>] shrink_lruvec+0x5c8/0x6c0
[ 751.852161] [<ffffffff81165f09>] ? compaction_suitable+0x19/0xc0
[ 751.852161] [<ffffffff81165f09>] ? compaction_suitable+0x19/0xc0
[ 751.852161] [<ffffffff81156a5d>] shrink_zone+0x4d/0x120
[ 751.852161] [<ffffffff811577ea>] do_try_to_free_pages+0x19a/0x3a0
[ 751.852161] [<ffffffff81152a7d>] ? pfmemalloc_watermark_ok+0xd/0xc0
[ 751.852161] [<ffffffff81157b42>] try_to_free_pages+0xb2/0x160
[ 751.852161] [<ffffffff81a4bb79>] ? _cond_resched+0x9/0x30
[ 751.852161] [<ffffffff8114adfb>] __alloc_pages_nodemask+0x5eb/0xa90
[ 751.852161] [<ffffffff81a541ed>] ? ftrace_call+0x5/0x2f
[ 751.852161] [<ffffffff811784f1>] ? anon_vma_prepare+0x21/0x190
[ 751.852161] [<ffffffff811916a8>] do_huge_pmd_anonymous_page+0xe8/0x330
[ 751.852161] [<ffffffff811783c9>] ? is_vma_temporary_stack+0x9/0x30
[ 751.852161] [<ffffffff8116edd5>] handle_mm_fault+0x135/0xb60
[ 751.852161] [<ffffffff81172015>] ? find_vma+0x15/0x80
[ 751.852161] [<ffffffff81166c6d>] ? vmacache_find+0xd/0xd0
[ 751.852161] [<ffffffff81097c2e>] ? __might_sleep+0xe/0x110
[ 751.852161] [<ffffffff81043c0d>] __do_page_fault+0x1ad/0x560
[ 751.852161] [<ffffffff81073000>] ? do_fork+0xe0/0x420
[ 751.852161] [<ffffffff81a53f43>] ? error_sti+0x5/0x6
[ 751.852161] [<ffffffff813e660d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 751.852161] [<ffffffff810440cc>] do_page_fault+0xc/0x10
[ 751.852161] [<ffffffff81a53d42>] page_fault+0x22/0x30
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v3 0/7] btrfs: implement swap file support
2014-12-12 10:32 ` [RFC PATCH v3 0/7] btrfs: implement " David Sterba
@ 2014-12-12 20:15 ` Omar Sandoval
0 siblings, 0 replies; 13+ messages in thread
From: Omar Sandoval @ 2014-12-12 20:15 UTC (permalink / raw)
To: David Sterba
Cc: Alexander Viro, Andrew Morton, Chris Mason, Josef Bacik,
Trond Myklebust, Christoph Hellwig, linux-btrfs, linux-fsdevel,
linux-mm, linux-nfs, linux-kernel
On Fri, Dec 12, 2014 at 11:32:13AM +0100, David Sterba wrote:
> On Tue, Dec 09, 2014 at 05:45:41PM -0800, Omar Sandoval wrote:
> > After some discussion on the mailing list, I decided that for simplicity and
> > reliability, it's best to simply disallow COW files and files with shared
> > extents (like files with extents shared with a snapshot). From a user's
> > perspective, this means that a snapshotted subvolume cannot be used for a swap
> > file, but keeping the swap file in a separate subvolume that is never
> > snapshotted seems entirely reasonable to me.
>
> Well, there are enough special cases how to do things on btrfs and I'd
> like to avoid introducing another one.
>
> > An alternative suggestion was to
> > allow swap files to be snapshotted and to do an implied COW on swap file
> > activation, which I was ready to implement until I realized that we can't permit
> > snapshotting a subvolume with an active swap file, so this creates a surprising
> > inconsistency for users (in my opinion).
>
> I still don't see why it's not possible to do the snapshot with an
> active swapfile.
>
Creating a snapshot of an active swapfile would create shared extents,
so the next time we have to swap out a page, we'd have to do a COW,
which we're already trying pretty hard to avoid. We could allow it, but
it might lead to some unreliable behavior and unhappy emails to the
mailing list. However, I do see your point about wanting to avoid
special cases, so I'd like to get some more input from others on this as
well.
> > As with before, this functionality is tenuously tested in a virtual machine with
> > some artificial workloads, but it "works for me". I'm pretty happy with the
> > results on my end, so please comment away.
>
> The non-btrfs changes can go independently and do not have to wait until
> we resolve the swap vs snapshot problem.
>
> I did a simple test and it crashed instantly, lockep complains:
>
> memory: 2G
> swap file: 1G
> kernel: 3.17 + v3
>
[snip]
That's my fault for not running with lockdep enabled. The problem here
is that swap-over-NFS is the only caller of nfs_direct_IO, so
nfs_direct_IO doesn't observe the normal direct_IO locking conventions
and neither does swap_writepage. I'll have to shuffle around some code
on the NFS side to fix that.
It looks like the non-btrfs parts of this might get a bit bigger, so
I'll look into getting that in separately.
Thanks!
--
Omar
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread