* [PATCH 1/2] Btrfs: don't compress for a small write
@ 2014-03-24 9:58 Wang Shilong
2014-03-24 9:58 ` [PATCH 2/2] Btrfs: scrub raid56 stripes in the right way Wang Shilong
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Wang Shilong @ 2014-03-24 9:58 UTC (permalink / raw)
To: linux-btrfs
To compress a small write(<=blocksize) dosen't save us
disk space at all, skip it can save us some compression time.
This patch can also fix wrong setting nocompression flag for
inode, say a case when @total_in is 4096, and then we get
@total_compressed 52,because we do aligment to page cache size
firstly, and then we get into conclusion @total_in=@total_compressed
thus we will clear this inode's compression flag.
An exception comes from inserting inline extent failure but we
still have @total_compressed < @total_in,so we will still reset
inode's flag, this is ok, because we don't have good compression
effect.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
---
fs/btrfs/inode.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0ec8766..d3eccd6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -394,6 +394,15 @@ static noinline int compress_file_range(struct inode *inode,
(start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size))
btrfs_add_inode_defrag(NULL, inode);
+ /*
+ * if this is a small write(<=blocksize), we don't save
+ * disk space at all, don't compress which can save us some
+ * compression time.
+ */
+ if ((end - start + 1) <= blocksize &&
+ (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size))
+ goto cleanup_and_bail_uncompressed;
+
actual_end = min_t(u64, isize, end + 1);
again:
will_compress = 0;
--
1.9.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] Btrfs: scrub raid56 stripes in the right way
2014-03-24 9:58 [PATCH 1/2] Btrfs: don't compress for a small write Wang Shilong
@ 2014-03-24 9:58 ` Wang Shilong
2014-03-28 12:00 ` Wang Shilong
2014-03-26 18:10 ` [PATCH 1/2] Btrfs: don't compress for a small write David Sterba
2014-03-31 12:31 ` Chris Mason
2 siblings, 1 reply; 7+ messages in thread
From: Wang Shilong @ 2014-03-24 9:58 UTC (permalink / raw)
To: linux-btrfs
Steps to reproduce:
# mkfs.btrfs -f /dev/sda[8-11] -m raid5 -d raid5
# mount /dev/sda8 /mnt
# btrfs scrub start -BR /mnt
# echo $? <--unverified errors make return value be 3
This is because we don't setup right mapping between physical
and logical address for raid56, which makes checksum mismatch.
But we will find everthing is fine later when rechecking using
btrfs_map_block().
This patch fixed the problem by settuping right mappings and
we only verify data stripes' checksums.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
---
fs/btrfs/scrub.c | 33 +++++++++++++++++++++++++--------
1 file changed, 25 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index db21a13..4182b44a 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2267,16 +2267,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
u64 extent_logical;
u64 extent_physical;
u64 extent_len;
+ u64 stripe_nr;
+ u64 tmp;
+ int stripe_index;
struct btrfs_device *extent_dev;
int extent_mirror_num;
- int stop_loop;
-
- if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
- if (num >= nr_data_stripes(map)) {
- return 0;
- }
- }
+ int stop_loop = 0;
nstripes = length;
offset = 0;
@@ -2296,6 +2292,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
} else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
increment = map->stripe_len;
mirror_num = num % map->num_stripes + 1;
+ } else if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
+ BTRFS_BLOCK_GROUP_RAID6)) {
+ if (num < nr_data_stripes(map))
+ offset = num * map->stripe_len;
+ else
+ offset = (nr_data_stripes(map) - 1) * map->stripe_len;
+ increment = map->stripe_len * nr_data_stripes(map);
+ mirror_num = 1;
} else {
increment = map->stripe_len;
mirror_num = 1;
@@ -2361,6 +2365,18 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
logic_end = logical + increment * nstripes;
ret = 0;
while (logical < logic_end) {
+ /* skip parity */
+ if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
+ BTRFS_BLOCK_GROUP_RAID6)) {
+ stripe_nr = logical - base;
+ do_div(stripe_nr, map->stripe_len);
+
+ stripe_index = do_div(stripe_nr, nr_data_stripes(map));
+ tmp = stripe_nr + stripe_index;
+ stripe_index = do_div(tmp, map->num_stripes);
+ if (stripe_index != num)
+ goto skip;
+ }
/*
* canceled?
*/
@@ -2521,6 +2537,7 @@ next:
path->slots[0]++;
}
btrfs_release_path(path);
+skip:
logical += increment;
physical += map->stripe_len;
spin_lock(&sctx->stat_lock);
--
1.9.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Btrfs: don't compress for a small write
2014-03-24 9:58 [PATCH 1/2] Btrfs: don't compress for a small write Wang Shilong
2014-03-24 9:58 ` [PATCH 2/2] Btrfs: scrub raid56 stripes in the right way Wang Shilong
@ 2014-03-26 18:10 ` David Sterba
2014-03-27 3:06 ` Wang Shilong
2014-03-31 12:31 ` Chris Mason
2 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2014-03-26 18:10 UTC (permalink / raw)
To: Wang Shilong; +Cc: linux-btrfs
On Mon, Mar 24, 2014 at 05:58:10PM +0800, Wang Shilong wrote:
> To compress a small write(<=blocksize) dosen't save us
> disk space at all, skip it can save us some compression time.
The compressibility depends on the data, a block full of zeros can
compress pretty well, so your patch is too limiting IMO.
> This patch can also fix wrong setting nocompression flag for
> inode, say a case when @total_in is 4096, and then we get
> @total_compressed 52,because we do aligment to page cache size
> firstly, and then we get into conclusion @total_in=@total_compressed
> thus we will clear this inode's compression flag.
This is a bug but can be fixed without disabling compression of small
blocks.
I have a similar patch as part of the large compression update, the
logic that decides if the small extent should be compressed or not
depends on the compression algo and some typical data samples. for zlib
it's around ~100 B and lzo at ~200 B. That's a boundary where compressed
size == uncompressed, so there's no benefit, only additional overhead.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Btrfs: don't compress for a small write
2014-03-26 18:10 ` [PATCH 1/2] Btrfs: don't compress for a small write David Sterba
@ 2014-03-27 3:06 ` Wang Shilong
0 siblings, 0 replies; 7+ messages in thread
From: Wang Shilong @ 2014-03-27 3:06 UTC (permalink / raw)
To: dsterba, linux-btrfs
Not really, you just miss inline part, say we compress a page
to 50B, but this inode's data can not be inlined(see inline check),
we still have to allocate 'blocksize'(min allocating unit) disk space
and then write data and zero filled datas into disk.
This patch only skips data can not be inlined, so i think this make senses.
Thanks,
Wang
On 03/27/2014 02:10 AM, David Sterba wrote:
> On Mon, Mar 24, 2014 at 05:58:10PM +0800, Wang Shilong wrote:
>> To compress a small write(<=blocksize) dosen't save us
>> disk space at all, skip it can save us some compression time.
> The compressibility depends on the data, a block full of zeros can
> compress pretty well, so your patch is too limiting IMO.
>
>> This patch can also fix wrong setting nocompression flag for
>> inode, say a case when @total_in is 4096, and then we get
>> @total_compressed 52,because we do aligment to page cache size
>> firstly, and then we get into conclusion @total_in=@total_compressed
>> thus we will clear this inode's compression flag.
> This is a bug but can be fixed without disabling compression of small
> blocks.
>
> I have a similar patch as part of the large compression update, the
> logic that decides if the small extent should be compressed or not
> depends on the compression algo and some typical data samples. for zlib
> it's around ~100 B and lzo at ~200 B. That's a boundary where compressed
> size == uncompressed, so there's no benefit, only additional overhead.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] Btrfs: scrub raid56 stripes in the right way
2014-03-24 9:58 ` [PATCH 2/2] Btrfs: scrub raid56 stripes in the right way Wang Shilong
@ 2014-03-28 12:00 ` Wang Shilong
0 siblings, 0 replies; 7+ messages in thread
From: Wang Shilong @ 2014-03-28 12:00 UTC (permalink / raw)
To: linux-btrfs
Oops, this patch is not working right, please ignore this one,
i will send a v2.
Thanks,
Wang
On 03/24/2014 05:58 PM, Wang Shilong wrote:
> Steps to reproduce:
> # mkfs.btrfs -f /dev/sda[8-11] -m raid5 -d raid5
> # mount /dev/sda8 /mnt
> # btrfs scrub start -BR /mnt
> # echo $? <--unverified errors make return value be 3
>
> This is because we don't setup right mapping between physical
> and logical address for raid56, which makes checksum mismatch.
> But we will find everthing is fine later when rechecking using
> btrfs_map_block().
>
> This patch fixed the problem by settuping right mappings and
> we only verify data stripes' checksums.
>
> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
> ---
> fs/btrfs/scrub.c | 33 +++++++++++++++++++++++++--------
> 1 file changed, 25 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index db21a13..4182b44a 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2267,16 +2267,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
> u64 extent_logical;
> u64 extent_physical;
> u64 extent_len;
> + u64 stripe_nr;
> + u64 tmp;
> + int stripe_index;
> struct btrfs_device *extent_dev;
> int extent_mirror_num;
> - int stop_loop;
> -
> - if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
> - BTRFS_BLOCK_GROUP_RAID6)) {
> - if (num >= nr_data_stripes(map)) {
> - return 0;
> - }
> - }
> + int stop_loop = 0;
>
> nstripes = length;
> offset = 0;
> @@ -2296,6 +2292,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
> } else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
> increment = map->stripe_len;
> mirror_num = num % map->num_stripes + 1;
> + } else if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
> + BTRFS_BLOCK_GROUP_RAID6)) {
> + if (num < nr_data_stripes(map))
> + offset = num * map->stripe_len;
> + else
> + offset = (nr_data_stripes(map) - 1) * map->stripe_len;
> + increment = map->stripe_len * nr_data_stripes(map);
> + mirror_num = 1;
> } else {
> increment = map->stripe_len;
> mirror_num = 1;
> @@ -2361,6 +2365,18 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
> logic_end = logical + increment * nstripes;
> ret = 0;
> while (logical < logic_end) {
> + /* skip parity */
> + if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
> + BTRFS_BLOCK_GROUP_RAID6)) {
> + stripe_nr = logical - base;
> + do_div(stripe_nr, map->stripe_len);
> +
> + stripe_index = do_div(stripe_nr, nr_data_stripes(map));
> + tmp = stripe_nr + stripe_index;
> + stripe_index = do_div(tmp, map->num_stripes);
> + if (stripe_index != num)
> + goto skip;
> + }
> /*
> * canceled?
> */
> @@ -2521,6 +2537,7 @@ next:
> path->slots[0]++;
> }
> btrfs_release_path(path);
> +skip:
> logical += increment;
> physical += map->stripe_len;
> spin_lock(&sctx->stat_lock);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Btrfs: don't compress for a small write
2014-03-24 9:58 [PATCH 1/2] Btrfs: don't compress for a small write Wang Shilong
2014-03-24 9:58 ` [PATCH 2/2] Btrfs: scrub raid56 stripes in the right way Wang Shilong
2014-03-26 18:10 ` [PATCH 1/2] Btrfs: don't compress for a small write David Sterba
@ 2014-03-31 12:31 ` Chris Mason
2014-03-31 12:53 ` Shilong Wang
2 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2014-03-31 12:31 UTC (permalink / raw)
To: Wang Shilong, linux-btrfs
On 03/24/2014 05:58 AM, Wang Shilong wrote:
> To compress a small write(<=blocksize) dosen't save us
> disk space at all, skip it can save us some compression time.
>
> This patch can also fix wrong setting nocompression flag for
> inode, say a case when @total_in is 4096, and then we get
> @total_compressed 52,because we do aligment to page cache size
> firstly, and then we get into conclusion @total_in=@total_compressed
> thus we will clear this inode's compression flag.
>
> An exception comes from inserting inline extent failure but we
> still have @total_compressed < @total_in,so we will still reset
> inode's flag, this is ok, because we don't have good compression
> effect.
>
So your check for start > 0 || end + 1 < disk_i_size means we're only
skipping compression for a small file range that isn't in an inline
extent. Could you please update the patch description and comments in
the code to reflect this?
Thanks!
-chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Btrfs: don't compress for a small write
2014-03-31 12:31 ` Chris Mason
@ 2014-03-31 12:53 ` Shilong Wang
0 siblings, 0 replies; 7+ messages in thread
From: Shilong Wang @ 2014-03-31 12:53 UTC (permalink / raw)
To: Chris Mason; +Cc: Wang Shilong, linux-btrfs
2014-03-31 20:31 GMT+08:00 Chris Mason <clm@fb.com>:
>
>
> On 03/24/2014 05:58 AM, Wang Shilong wrote:
>>
>> To compress a small write(<=blocksize) dosen't save us
>> disk space at all, skip it can save us some compression time.
>>
>> This patch can also fix wrong setting nocompression flag for
>> inode, say a case when @total_in is 4096, and then we get
>> @total_compressed 52,because we do aligment to page cache size
>> firstly, and then we get into conclusion @total_in=@total_compressed
>> thus we will clear this inode's compression flag.
>>
>> An exception comes from inserting inline extent failure but we
>> still have @total_compressed < @total_in,so we will still reset
>> inode's flag, this is ok, because we don't have good compression
>> effect.
>>
>
> So your check for start > 0 || end + 1 < disk_i_size means we're only
> skipping compression for a small file range that isn't in an inline extent.
> Could you please update the patch description and comments in the code to
> reflect this?
No problem, i will update this patch.~_~
>
> Thanks!
>
> -chris
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-03-31 12:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-24 9:58 [PATCH 1/2] Btrfs: don't compress for a small write Wang Shilong
2014-03-24 9:58 ` [PATCH 2/2] Btrfs: scrub raid56 stripes in the right way Wang Shilong
2014-03-28 12:00 ` Wang Shilong
2014-03-26 18:10 ` [PATCH 1/2] Btrfs: don't compress for a small write David Sterba
2014-03-27 3:06 ` Wang Shilong
2014-03-31 12:31 ` Chris Mason
2014-03-31 12:53 ` Shilong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).