* [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()
2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
@ 2015-06-22 22:47 ` Mark Fasheh
0 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-22 22:47 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
In the case that we dedupe the tail of a file, we might expand the dedupe
len out to the end of our last block. We don't want to compare data past
i_size however, so pass the original length to btrfs_cmp_data().
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/ioctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2d24ff4..2deea1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
goto out_unlock;
}
- ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
+ /* pass original length for comparison so we stay within i_size */
+ ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
--
2.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()
2015-06-23 21:28 [PATCH 0/5] btrfs: dedupe fixes, features V3 Mark Fasheh
@ 2015-06-23 21:28 ` Mark Fasheh
2015-06-25 3:04 ` Liu Bo
0 siblings, 1 reply; 10+ messages in thread
From: Mark Fasheh @ 2015-06-23 21:28 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
In the case that we dedupe the tail of a file, we might expand the dedupe
len out to the end of our last block. We don't want to compare data past
i_size however, so pass the original length to btrfs_cmp_data().
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/ioctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2d24ff4..2deea1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
goto out_unlock;
}
- ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
+ /* pass original length for comparison so we stay within i_size */
+ ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()
2015-06-23 21:28 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
@ 2015-06-25 3:04 ` Liu Bo
0 siblings, 0 replies; 10+ messages in thread
From: Liu Bo @ 2015-06-25 3:04 UTC (permalink / raw)
To: Mark Fasheh; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs
On Tue, Jun 23, 2015 at 02:28:36PM -0700, Mark Fasheh wrote:
> In the case that we dedupe the tail of a file, we might expand the dedupe
> len out to the end of our last block. We don't want to compare data past
> i_size however, so pass the original length to btrfs_cmp_data().
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
>
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> Reviewed-by: David Sterba <dsterba@suse.cz>
> ---
> fs/btrfs/ioctl.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 2d24ff4..2deea1f 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> goto out_unlock;
> }
>
> - ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
> + /* pass original length for comparison so we stay within i_size */
> + ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
> if (ret == 0)
> ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
>
> --
> 2.1.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()
2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
@ 2015-06-26 21:00 ` Mark Fasheh
0 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:00 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
In the case that we dedupe the tail of a file, we might expand the dedupe
len out to the end of our last block. We don't want to compare data past
i_size however, so pass the original length to btrfs_cmp_data().
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/ioctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2d24ff4..2deea1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
goto out_unlock;
}
- ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
+ /* pass original length for comparison so we stay within i_size */
+ ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 0/5] btrfs: dedupe fixes, features V5
@ 2015-06-30 21:42 Mark Fasheh
2015-06-30 21:42 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-30 21:42 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs
Hi Chris,
The following patches are based on top of my patch titled "btrfs:
Handle unaligned length in extent_same" which you have in your
'integration-4.2' branch:
https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=e1d227a42ea2b4664f94212bd1106b9a3413ffb8
The series contains three fixes and two features.
The first patch in the series fixes a bug where we were sometimes
passing the aligned length to our comparison function. We actually can
stop at the user passed length for this as we don't need to compare
data past i_size (and we only align if the extents go out to i_size).
The 2nd patch fixes a deadlock between btrfs readpage and
btrfs_extent_same. This was reported on the list some months ago -
basically we had the page and extent locking reversed. My patch fixes
up the locking to be in the right order.
The 3rd patch fixes a deadlocks in clone() (wrt extent-same) which
David found while reviewing my fixes. I also found that clone doesn't
lock extent ranges in any particular order which could obvioulsy be a
problem so that is fixed there too.
The last two patches add features which have been requested often by
users - the 4th adds the ability to dedupe within the same inode, and
the last patch fixes up the clone code to skip updating mtime and
ctime when called from extent-same (this helps with backup software
which is being confused into re-backing up deduped files).
Changes from V4 to V5:
- don't update ctime either in patch #5
Changes from V3 to V4:
- change mtime patch from a flag to default behavior
Changes from V2 to V3:
- better flag naming in patch #5, thanks to David
Changes from V1 to V2:
- added patches 3-5
Thanks,
--Mark
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()
2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
@ 2015-06-30 21:42 ` Mark Fasheh
2015-06-30 21:42 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-30 21:42 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
In the case that we dedupe the tail of a file, we might expand the dedupe
len out to the end of our last block. We don't want to compare data past
i_size however, so pass the original length to btrfs_cmp_data().
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/ioctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2d24ff4..2deea1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
goto out_unlock;
}
- ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
+ /* pass original length for comparison so we stay within i_size */
+ ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage
2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
2015-06-30 21:42 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
@ 2015-06-30 21:42 ` Mark Fasheh
2015-06-30 21:42 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-30 21:42 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
->readpage() does page_lock() before extent_lock(), we do the opposite in
extent-same. We want to reverse the order in btrfs_extent_same() but it's
not quite straightforward since the page locks are taken inside btrfs_cmp_data().
So I split btrfs_cmp_data() into 3 parts with a small context structure that
is passed between them. The first, btrfs_cmp_data_prepare() gathers up the
pages needed (taking page lock as required) and puts them on our context
structure. At this point, we are safe to lock the extent range. Afterwards,
we use btrfs_cmp_data() to do the data compare as usual and btrfs_cmp_data_free()
to clean up our context.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/ioctl.c | 148 +++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 117 insertions(+), 31 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2deea1f..b899584 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2755,14 +2755,11 @@ out:
return ret;
}
-static struct page *extent_same_get_page(struct inode *inode, u64 off)
+static struct page *extent_same_get_page(struct inode *inode, pgoff_t index)
{
struct page *page;
- pgoff_t index;
struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
- index = off >> PAGE_CACHE_SHIFT;
-
page = grab_cache_page(inode->i_mapping, index);
if (!page)
return NULL;
@@ -2783,6 +2780,20 @@ static struct page *extent_same_get_page(struct inode *inode, u64 off)
return page;
}
+static int gather_extent_pages(struct inode *inode, struct page **pages,
+ int num_pages, u64 off)
+{
+ int i;
+ pgoff_t index = off >> PAGE_CACHE_SHIFT;
+
+ for (i = 0; i < num_pages; i++) {
+ pages[i] = extent_same_get_page(inode, index + i);
+ if (!pages[i])
+ return -ENOMEM;
+ }
+ return 0;
+}
+
static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
{
/* do any pending delalloc/csum calc on src, one way or
@@ -2808,52 +2819,120 @@ static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
}
}
-static void btrfs_double_unlock(struct inode *inode1, u64 loff1,
- struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_unlock(struct inode *inode1, struct inode *inode2)
{
- unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
- unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
-
mutex_unlock(&inode1->i_mutex);
mutex_unlock(&inode2->i_mutex);
}
-static void btrfs_double_lock(struct inode *inode1, u64 loff1,
- struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
+{
+ if (inode1 < inode2)
+ swap(inode1, inode2);
+
+ mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+ if (inode1 != inode2)
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+}
+
+static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
+ struct inode *inode2, u64 loff2, u64 len)
+{
+ unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
+ unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
+}
+
+static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
+ struct inode *inode2, u64 loff2, u64 len)
{
if (inode1 < inode2) {
swap(inode1, inode2);
swap(loff1, loff2);
}
-
- mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
lock_extent_range(inode1, loff1, len);
- if (inode1 != inode2) {
- mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+ if (inode1 != inode2)
lock_extent_range(inode2, loff2, len);
+}
+
+struct cmp_pages {
+ int num_pages;
+ struct page **src_pages;
+ struct page **dst_pages;
+};
+
+static void btrfs_cmp_data_free(struct cmp_pages *cmp)
+{
+ int i;
+ struct page *pg;
+
+ for (i = 0; i < cmp->num_pages; i++) {
+ pg = cmp->src_pages[i];
+ if (pg)
+ page_cache_release(pg);
+ pg = cmp->dst_pages[i];
+ if (pg)
+ page_cache_release(pg);
+ }
+ kfree(cmp->src_pages);
+ kfree(cmp->dst_pages);
+}
+
+static int btrfs_cmp_data_prepare(struct inode *src, u64 loff,
+ struct inode *dst, u64 dst_loff,
+ u64 len, struct cmp_pages *cmp)
+{
+ int ret;
+ int num_pages = PAGE_CACHE_ALIGN(len) >> PAGE_CACHE_SHIFT;
+ struct page **src_pgarr, **dst_pgarr;
+
+ /*
+ * We must gather up all the pages before we initiate our
+ * extent locking. We use an array for the page pointers. Size
+ * of the array is bounded by len, which is in turn bounded by
+ * BTRFS_MAX_DEDUPE_LEN.
+ */
+ src_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
+ dst_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
+ if (!src_pgarr || !dst_pgarr) {
+ kfree(src_pgarr);
+ kfree(dst_pgarr);
+ return -ENOMEM;
}
+ cmp->num_pages = num_pages;
+ cmp->src_pages = src_pgarr;
+ cmp->dst_pages = dst_pgarr;
+
+ ret = gather_extent_pages(src, cmp->src_pages, cmp->num_pages, loff);
+ if (ret)
+ goto out;
+
+ ret = gather_extent_pages(dst, cmp->dst_pages, cmp->num_pages, dst_loff);
+
+out:
+ if (ret)
+ btrfs_cmp_data_free(cmp);
+ return 0;
}
static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
- u64 dst_loff, u64 len)
+ u64 dst_loff, u64 len, struct cmp_pages *cmp)
{
int ret = 0;
+ int i;
struct page *src_page, *dst_page;
unsigned int cmp_len = PAGE_CACHE_SIZE;
void *addr, *dst_addr;
+ i = 0;
while (len) {
if (len < PAGE_CACHE_SIZE)
cmp_len = len;
- src_page = extent_same_get_page(src, loff);
- if (!src_page)
- return -EINVAL;
- dst_page = extent_same_get_page(dst, dst_loff);
- if (!dst_page) {
- page_cache_release(src_page);
- return -EINVAL;
- }
+ BUG_ON(i >= cmp->num_pages);
+
+ src_page = cmp->src_pages[i];
+ dst_page = cmp->dst_pages[i];
+
addr = kmap_atomic(src_page);
dst_addr = kmap_atomic(dst_page);
@@ -2865,15 +2944,12 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
kunmap_atomic(addr);
kunmap_atomic(dst_addr);
- page_cache_release(src_page);
- page_cache_release(dst_page);
if (ret)
break;
- loff += cmp_len;
- dst_loff += cmp_len;
len -= cmp_len;
+ i++;
}
return ret;
@@ -2904,6 +2980,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
{
int ret;
u64 len = olen;
+ struct cmp_pages cmp;
/*
* btrfs_clone() can't handle extents in the same file
@@ -2916,7 +2993,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
if (len == 0)
return 0;
- btrfs_double_lock(src, loff, dst, dst_loff, len);
+ btrfs_double_inode_lock(src, dst);
ret = extent_same_check_offsets(src, loff, &len, olen);
if (ret)
@@ -2933,13 +3010,22 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
goto out_unlock;
}
+ ret = btrfs_cmp_data_prepare(src, loff, dst, dst_loff, olen, &cmp);
+ if (ret)
+ goto out_unlock;
+
+ btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
+
/* pass original length for comparison so we stay within i_size */
- ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
+ ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
+ btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
+
+ btrfs_cmp_data_free(&cmp);
out_unlock:
- btrfs_double_unlock(src, loff, dst, dst_loff, len);
+ btrfs_double_inode_unlock(src, dst);
return ret;
}
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/5] btrfs: fix clone / extent-same deadlocks
2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
2015-06-30 21:42 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
2015-06-30 21:42 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
@ 2015-06-30 21:42 ` Mark Fasheh
2015-06-30 21:42 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
2015-06-30 21:42 ` [PATCH 5/5] btrfs: don't update mtime/ctime on deduped inodes Mark Fasheh
4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-30 21:42 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
Clone and extent same lock their source and target inodes in opposite order.
In addition to this, the range locking in clone doesn't take ordering into
account. Fix this by having clone use the same locking helpers as
btrfs-extent-same.
In addition, I do a small cleanup of the locking helpers, removing a case
(both inodes being the same) which was poorly accounted for and never
actually used by the callers.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/ioctl.c | 34 ++++++++--------------------------
1 file changed, 8 insertions(+), 26 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b899584..8d6887d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2831,8 +2831,7 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
swap(inode1, inode2);
mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
- if (inode1 != inode2)
- mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+ mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
}
static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
@@ -2850,8 +2849,7 @@ static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
swap(loff1, loff2);
}
lock_extent_range(inode1, loff1, len);
- if (inode1 != inode2)
- lock_extent_range(inode2, loff2, len);
+ lock_extent_range(inode2, loff2, len);
}
struct cmp_pages {
@@ -3713,13 +3711,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
goto out_fput;
if (!same_inode) {
- if (inode < src) {
- mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD);
- } else {
- mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT);
- mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
- }
+ btrfs_double_inode_lock(src, inode);
} else {
mutex_lock(&src->i_mutex);
}
@@ -3769,8 +3761,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
lock_extent_range(src, lock_start, lock_len);
} else {
- lock_extent_range(src, off, len);
- lock_extent_range(inode, destoff, len);
+ btrfs_double_extent_lock(src, off, inode, destoff, len);
}
ret = btrfs_clone(src, inode, off, olen, len, destoff);
@@ -3781,9 +3772,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
unlock_extent(&BTRFS_I(src)->io_tree, lock_start, lock_end);
} else {
- unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1);
- unlock_extent(&BTRFS_I(inode)->io_tree, destoff,
- destoff + len - 1);
+ btrfs_double_extent_unlock(src, off, inode, destoff, len);
}
/*
* Truncate page cache pages so that future reads will see the cloned
@@ -3792,17 +3781,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
truncate_inode_pages_range(&inode->i_data, destoff,
PAGE_CACHE_ALIGN(destoff + len) - 1);
out_unlock:
- if (!same_inode) {
- if (inode < src) {
- mutex_unlock(&src->i_mutex);
- mutex_unlock(&inode->i_mutex);
- } else {
- mutex_unlock(&inode->i_mutex);
- mutex_unlock(&src->i_mutex);
- }
- } else {
+ if (!same_inode)
+ btrfs_double_inode_unlock(src, inode);
+ else
mutex_unlock(&src->i_mutex);
- }
out_fput:
fdput(src_file);
out_drop_write:
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/5] btrfs: allow dedupe of same inode
2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
` (2 preceding siblings ...)
2015-06-30 21:42 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
@ 2015-06-30 21:42 ` Mark Fasheh
2015-06-30 21:42 ` [PATCH 5/5] btrfs: don't update mtime/ctime on deduped inodes Mark Fasheh
4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-30 21:42 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
clone() supports cloning within an inode so extent-same can do
the same now. This patch fixes up the locking in extent-same to
know about the single-inode case. In addition to that, we add a
check for overlapping ranges, which clone does not allow.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
fs/btrfs/ioctl.c | 76 ++++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 60 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8d6887d..83f4679 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2979,27 +2979,61 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
int ret;
u64 len = olen;
struct cmp_pages cmp;
+ int same_inode = 0;
+ u64 same_lock_start = 0;
+ u64 same_lock_len = 0;
- /*
- * btrfs_clone() can't handle extents in the same file
- * yet. Once that works, we can drop this check and replace it
- * with a check for the same inode, but overlapping extents.
- */
if (src == dst)
- return -EINVAL;
+ same_inode = 1;
if (len == 0)
return 0;
- btrfs_double_inode_lock(src, dst);
+ if (same_inode) {
+ mutex_lock(&src->i_mutex);
- ret = extent_same_check_offsets(src, loff, &len, olen);
- if (ret)
- goto out_unlock;
+ ret = extent_same_check_offsets(src, loff, &len, olen);
+ if (ret)
+ goto out_unlock;
- ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
- if (ret)
- goto out_unlock;
+ /*
+ * Single inode case wants the same checks, except we
+ * don't want our length pushed out past i_size as
+ * comparing that data range makes no sense.
+ *
+ * extent_same_check_offsets() will do this for an
+ * unaligned length at i_size, so catch it here and
+ * reject the request.
+ *
+ * This effectively means we require aligned extents
+ * for the single-inode case, whereas the other cases
+ * allow an unaligned length so long as it ends at
+ * i_size.
+ */
+ if (len != olen) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /* Check for overlapping ranges */
+ if (dst_loff + len > loff && dst_loff < loff + len) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ same_lock_start = min_t(u64, loff, dst_loff);
+ same_lock_len = max_t(u64, loff, dst_loff) + len - same_lock_start;
+ } else {
+ btrfs_double_inode_lock(src, dst);
+
+ ret = extent_same_check_offsets(src, loff, &len, olen);
+ if (ret)
+ goto out_unlock;
+
+ ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
+ if (ret)
+ goto out_unlock;
+ }
/* don't make the dst file partly checksummed */
if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
@@ -3012,18 +3046,28 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
if (ret)
goto out_unlock;
- btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
+ if (same_inode)
+ lock_extent_range(src, same_lock_start, same_lock_len);
+ else
+ btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
/* pass original length for comparison so we stay within i_size */
ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
- btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
+ if (same_inode)
+ unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
+ same_lock_start + same_lock_len - 1);
+ else
+ btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
btrfs_cmp_data_free(&cmp);
out_unlock:
- btrfs_double_inode_unlock(src, dst);
+ if (same_inode)
+ mutex_unlock(&src->i_mutex);
+ else
+ btrfs_double_inode_unlock(src, dst);
return ret;
}
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] btrfs: don't update mtime/ctime on deduped inodes
2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
` (3 preceding siblings ...)
2015-06-30 21:42 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
@ 2015-06-30 21:42 ` Mark Fasheh
4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-30 21:42 UTC (permalink / raw)
To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh
One issue users have reported is that dedupe changes mtime on files,
resulting in tools like rsync thinking that their contents have changed when
in fact the data is exactly the same. We also skip the ctime update as no
user-visible metadata changes here and we want dedupe to be transparent to
the user.
Clone still wants time changes, so we special case this in the code.
This was tested with the btrfs-extent-same tool.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
fs/btrfs/ioctl.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 83f4679..5fdad21 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
static int btrfs_clone(struct inode *src, struct inode *inode,
- u64 off, u64 olen, u64 olen_aligned, u64 destoff);
+ u64 off, u64 olen, u64 olen_aligned, u64 destoff,
+ int no_time_update);
/* Mask out flags that are inappropriate for the given type of inode. */
static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
/* pass original length for comparison so we stay within i_size */
ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
if (ret == 0)
- ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
+ ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
if (same_inode)
unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3219,13 +3220,15 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
struct inode *inode,
u64 endoff,
const u64 destoff,
- const u64 olen)
+ const u64 olen,
+ int no_time_update)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
int ret;
inode_inc_iversion(inode);
- inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+ if (!no_time_update)
+ inode->i_mtime = inode->i_ctime = CURRENT_TIME;
/*
* We round up to the block size at eof when determining which
* extents to clone above, but shouldn't round up the file size.
@@ -3310,13 +3313,13 @@ static void clone_update_extent_map(struct inode *inode,
* @inode: Inode to clone to
* @off: Offset within source to start clone from
* @olen: Original length, passed by user, of range to clone
- * @olen_aligned: Block-aligned value of olen, extent_same uses
- * identical values here
+ * @olen_aligned: Block-aligned value of olen
* @destoff: Offset within @inode to start clone
+ * @no_time_update: Whether to update mtime/ctime on the target inode
*/
static int btrfs_clone(struct inode *src, struct inode *inode,
const u64 off, const u64 olen, const u64 olen_aligned,
- const u64 destoff)
+ const u64 destoff, int no_time_update)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_path *path = NULL;
@@ -3640,7 +3643,8 @@ process_slot:
root->sectorsize);
ret = clone_finish_inode_update(trans, inode,
last_dest_end,
- destoff, olen);
+ destoff, olen,
+ no_time_update);
if (ret)
goto out;
if (new_key.offset + datal >= destoff + len)
@@ -3678,7 +3682,7 @@ process_slot:
clone_update_extent_map(inode, trans, NULL, last_dest_end,
destoff + len - last_dest_end);
ret = clone_finish_inode_update(trans, inode, destoff + len,
- destoff, olen);
+ destoff, olen, no_time_update);
}
out:
@@ -3808,7 +3812,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
btrfs_double_extent_lock(src, off, inode, destoff, len);
}
- ret = btrfs_clone(src, inode, off, olen, len, destoff);
+ ret = btrfs_clone(src, inode, off, olen, len, destoff, 0);
if (same_inode) {
u64 lock_start = min_t(u64, off, destoff);
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-06-30 21:42 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
2015-06-30 21:42 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
2015-06-30 21:42 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
2015-06-30 21:42 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
2015-06-30 21:42 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
2015-06-30 21:42 ` [PATCH 5/5] btrfs: don't update mtime/ctime on deduped inodes Mark Fasheh
-- strict thread matches above, loose matches on Subject: below --
2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
2015-06-26 21:00 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
2015-06-23 21:28 [PATCH 0/5] btrfs: dedupe fixes, features V3 Mark Fasheh
2015-06-23 21:28 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
2015-06-25 3:04 ` Liu Bo
2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
2015-06-22 22:47 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox