* [PATCH 0/3] Btrfs: populate heuristic with detection logic
@ 2017-07-24 11:37 Timofey Titovets
2017-07-24 11:37 ` [PATCH 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Timofey Titovets @ 2017-07-24 11:37 UTC (permalink / raw)
To: linux-btrfs; +Cc: Timofey Titovets
Based on kdave for-next
As heuristic skeleton already merged
Populate heuristic with basic code that:
1. Collect sample from input data
2. Calculate byte set for sample
For detect easily compressible data
3. Calculate byte core set size
For detect easily and not compressible data
Timofey Titovets (3):
Btrfs: heuristic add simple sampling logic
Btrfs: heuristic add byte set calculation
Btrfs: heuristic add byte core set calculation
fs/btrfs/compression.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++-
fs/btrfs/compression.h | 14 +++++++
2 files changed, 123 insertions(+), 2 deletions(-)
--
2.13.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] Btrfs: heuristic add simple sampling logic
2017-07-24 11:37 [PATCH 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
@ 2017-07-24 11:37 ` Timofey Titovets
2017-07-24 14:55 ` Josef Bacik
2017-07-24 11:37 ` [PATCH 2/3] Btrfs: heuristic add byte set calculation Timofey Titovets
` (2 subsequent siblings)
3 siblings, 1 reply; 6+ messages in thread
From: Timofey Titovets @ 2017-07-24 11:37 UTC (permalink / raw)
To: linux-btrfs; +Cc: Timofey Titovets
Get small sample from input data
and calculate byte type count for that sample
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
fs/btrfs/compression.c | 24 ++++++++++++++++++++++--
fs/btrfs/compression.h | 11 +++++++++++
2 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 63f54bd2d5bb..1501d4fe90cc 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1068,15 +1068,35 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
u64 index = start >> PAGE_SHIFT;
u64 end_index = end >> PAGE_SHIFT;
struct page *page;
- int ret = 1;
+ struct heuristic_bucket_item *bucket;
+ int a, b, ret;
+ u8 symbol, *input_data;
+
+ ret = 1;
+
+ bucket = kcalloc(BTRFS_HEURISTIC_BUCKET_SIZE,
+ sizeof(struct heuristic_bucket_item), GFP_NOFS);
+
+ if (!bucket)
+ goto out;
while (index <= end_index) {
page = find_get_page(inode->i_mapping, index);
- kmap(page);
+ input_data = kmap(page);
+ a = 0;
+ while (a < PAGE_SIZE) {
+ for (b = 0; b < BTRFS_HEURISTIC_READ_SIZE; b++) {
+ symbol = input_data[a+b];
+ bucket[symbol].count++;
+ }
+ a += BTRFS_HEURISTIC_ITERATOR_OFFSET;
+ }
kunmap(page);
put_page(page);
index++;
}
+out:
+ kfree(bucket);
return ret;
}
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index d1f4eee2d0af..984943e5e1ae 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -129,6 +129,17 @@ struct btrfs_compress_op {
extern const struct btrfs_compress_op btrfs_zlib_compress;
extern const struct btrfs_compress_op btrfs_lzo_compress;
+struct heuristic_bucket_item {
+ u8 padding;
+ u8 symbol;
+ u16 count;
+};
+
+#define BTRFS_HEURISTIC_READ_SIZE 16
+#define BTRFS_HEURISTIC_READS_PER_PAGE 8*PAGE_SIZE/4096
+#define BTRFS_HEURISTIC_ITERATOR_OFFSET PAGE_SIZE/BTRFS_HEURISTIC_READS_PER_PAGE
+#define BTRFS_HEURISTIC_BUCKET_SIZE 256
+
int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
#endif
--
2.13.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] Btrfs: heuristic add byte set calculation
2017-07-24 11:37 [PATCH 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
2017-07-24 11:37 ` [PATCH 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
@ 2017-07-24 11:37 ` Timofey Titovets
2017-07-24 11:37 ` [PATCH 3/3] Btrfs: heuristic add byte core " Timofey Titovets
2017-07-24 15:02 ` [PATCH 0/3] Btrfs: populate heuristic with detection logic Josef Bacik
3 siblings, 0 replies; 6+ messages in thread
From: Timofey Titovets @ 2017-07-24 11:37 UTC (permalink / raw)
To: linux-btrfs; +Cc: Timofey Titovets
Calculate byte set size for data sample
if byte set low, data are easily compressible
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
fs/btrfs/compression.c | 27 +++++++++++++++++++++++++++
fs/btrfs/compression.h | 1 +
2 files changed, 28 insertions(+)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1501d4fe90cc..fee7808e3f15 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1048,6 +1048,27 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
return 1;
}
+static inline int byte_set_size(const struct heuristic_bucket_item *bucket)
+{
+ int a = 0;
+ int byte_set_size = 0;
+
+ for (; a < BTRFS_HEURISTIC_BYTE_SET_THRESHOLD; a++) {
+ if (bucket[a].count > 0)
+ byte_set_size++;
+ }
+
+ for (; a < BTRFS_HEURISTIC_BUCKET_SIZE; a++) {
+ if (bucket[a].count > 0) {
+ byte_set_size++;
+ if (byte_set_size > BTRFS_HEURISTIC_BYTE_SET_THRESHOLD)
+ return byte_set_size;
+ }
+ }
+
+ return byte_set_size;
+}
+
/*
* Compression heuristic.
*
@@ -1096,6 +1117,12 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
index++;
}
+ a = byte_set_size(bucket);
+ if (a > BTRFS_HEURISTIC_BYTE_SET_THRESHOLD) {
+ ret = 1;
+ goto out;
+ }
+
out:
kfree(bucket);
return ret;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 984943e5e1ae..6e5b2ce36b52 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -139,6 +139,7 @@ struct heuristic_bucket_item {
#define BTRFS_HEURISTIC_READS_PER_PAGE 8*PAGE_SIZE/4096
#define BTRFS_HEURISTIC_ITERATOR_OFFSET PAGE_SIZE/BTRFS_HEURISTIC_READS_PER_PAGE
#define BTRFS_HEURISTIC_BUCKET_SIZE 256
+#define BTRFS_HEURISTIC_BYTE_SET_THRESHOLD (256/4)
int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
--
2.13.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] Btrfs: heuristic add byte core set calculation
2017-07-24 11:37 [PATCH 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
2017-07-24 11:37 ` [PATCH 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
2017-07-24 11:37 ` [PATCH 2/3] Btrfs: heuristic add byte set calculation Timofey Titovets
@ 2017-07-24 11:37 ` Timofey Titovets
2017-07-24 15:02 ` [PATCH 0/3] Btrfs: populate heuristic with detection logic Josef Bacik
3 siblings, 0 replies; 6+ messages in thread
From: Timofey Titovets @ 2017-07-24 11:37 UTC (permalink / raw)
To: linux-btrfs; +Cc: Timofey Titovets
Calculate byte core set for data sample
For low core set, data are easily compressible
For high core set, data are not compressible
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
fs/btrfs/compression.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/compression.h | 2 ++
2 files changed, 62 insertions(+)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index fee7808e3f15..e7f811d20fd8 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1069,6 +1069,42 @@ static inline int byte_set_size(const struct heuristic_bucket_item *bucket)
return byte_set_size;
}
+/* For bucket sorting */
+static inline int heuristic_bucket_compare(const void *lv, const void *rv)
+{
+ struct heuristic_bucket_item *l = (struct heuristic_bucket_item *)(lv);
+ struct heuristic_bucket_item *r = (struct heuristic_bucket_item *)(rv);
+
+ return r->count - l->count;
+}
+
+/*
+ * Byte Core set size
+ * How many bytes use 90% of sample
+ */
+static inline int byte_core_set_size(struct heuristic_bucket_item *bucket,
+ u32 core_set_threshold)
+{
+ int a = 0;
+ u32 coreset_sum = 0;
+
+ for (; a < BTRFS_HEURISTIC_BYTE_CORE_SET_LOW; a++)
+ coreset_sum += bucket[a].count;
+
+ if (coreset_sum > core_set_threshold)
+ return a;
+
+ for (; a < BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH; a++) {
+ if (bucket[a].count == 0)
+ break;
+ coreset_sum += bucket[a].count;
+ if (coreset_sum > core_set_threshold)
+ break;
+ }
+
+ return a;
+}
+
/*
* Compression heuristic.
*
@@ -1092,6 +1128,8 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
struct heuristic_bucket_item *bucket;
int a, b, ret;
u8 symbol, *input_data;
+ u32 core_set_threshold;
+ u64 input_size = start - end;
ret = 1;
@@ -1123,6 +1161,28 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
goto out;
}
+ for (a = 1; a < BTRFS_HEURISTIC_BUCKET_SIZE; a++)
+ bucket[a].symbol = a;
+
+ /* Sort in reverse order */
+ sort(bucket, BTRFS_HEURISTIC_BUCKET_SIZE,
+ sizeof(struct heuristic_bucket_item), &heuristic_bucket_compare,
+ NULL);
+
+ core_set_threshold = input_size*90/BTRFS_HEURISTIC_ITERATOR_OFFSET/100;
+ core_set_threshold *= BTRFS_HEURISTIC_READ_SIZE;
+
+ a = byte_core_set_size(bucket, core_set_threshold);
+ if (a <= BTRFS_HEURISTIC_BYTE_CORE_SET_LOW) {
+ ret = 2;
+ goto out;
+ }
+
+ if (a >= BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH) {
+ ret = 0;
+ goto out;
+ }
+
out:
kfree(bucket);
return ret;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 6e5b2ce36b52..f323f464bb13 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -140,6 +140,8 @@ struct heuristic_bucket_item {
#define BTRFS_HEURISTIC_ITERATOR_OFFSET PAGE_SIZE/BTRFS_HEURISTIC_READS_PER_PAGE
#define BTRFS_HEURISTIC_BUCKET_SIZE 256
#define BTRFS_HEURISTIC_BYTE_SET_THRESHOLD (256/4)
+#define BTRFS_HEURISTIC_BYTE_CORE_SET_LOW (256/5)
+#define BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH (256 - 256/5)
int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
--
2.13.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] Btrfs: heuristic add simple sampling logic
2017-07-24 11:37 ` [PATCH 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
@ 2017-07-24 14:55 ` Josef Bacik
0 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2017-07-24 14:55 UTC (permalink / raw)
To: Timofey Titovets; +Cc: linux-btrfs
On Mon, Jul 24, 2017 at 02:37:06PM +0300, Timofey Titovets wrote:
> Get small sample from input data
> and calculate byte type count for that sample
>
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
> fs/btrfs/compression.c | 24 ++++++++++++++++++++++--
> fs/btrfs/compression.h | 11 +++++++++++
> 2 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 63f54bd2d5bb..1501d4fe90cc 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1068,15 +1068,35 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
> u64 index = start >> PAGE_SHIFT;
> u64 end_index = end >> PAGE_SHIFT;
> struct page *page;
> - int ret = 1;
> + struct heuristic_bucket_item *bucket;
> + int a, b, ret;
> + u8 symbol, *input_data;
> +
> + ret = 1;
> +
> + bucket = kcalloc(BTRFS_HEURISTIC_BUCKET_SIZE,
> + sizeof(struct heuristic_bucket_item), GFP_NOFS);
> +
> + if (!bucket)
> + goto out;
>
> while (index <= end_index) {
> page = find_get_page(inode->i_mapping, index);
> - kmap(page);
> + input_data = kmap(page);
> + a = 0;
> + while (a < PAGE_SIZE) {
> + for (b = 0; b < BTRFS_HEURISTIC_READ_SIZE; b++) {
> + symbol = input_data[a+b];
> + bucket[symbol].count++;
> + }
> + a += BTRFS_HEURISTIC_ITERATOR_OFFSET;
> + }
> kunmap(page);
> put_page(page);
> index++;
> }
>
> +out:
> + kfree(bucket);
> return ret;
> }
> diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
> index d1f4eee2d0af..984943e5e1ae 100644
> --- a/fs/btrfs/compression.h
> +++ b/fs/btrfs/compression.h
> @@ -129,6 +129,17 @@ struct btrfs_compress_op {
> extern const struct btrfs_compress_op btrfs_zlib_compress;
> extern const struct btrfs_compress_op btrfs_lzo_compress;
>
> +struct heuristic_bucket_item {
> + u8 padding;
> + u8 symbol;
> + u16 count;
> +};
> +
> +#define BTRFS_HEURISTIC_READ_SIZE 16
> +#define BTRFS_HEURISTIC_READS_PER_PAGE 8*PAGE_SIZE/4096
I hate magic numbers, why is this 8*PAGE_SIZE/4096? If you want to check every
512 bytes why not just set BTRFS_HEURISTIC_ITERATOR_OFFSET to 512? That makes
it easier to understand what you are trying to accomplish. Thanks,
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] Btrfs: populate heuristic with detection logic
2017-07-24 11:37 [PATCH 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
` (2 preceding siblings ...)
2017-07-24 11:37 ` [PATCH 3/3] Btrfs: heuristic add byte core " Timofey Titovets
@ 2017-07-24 15:02 ` Josef Bacik
3 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2017-07-24 15:02 UTC (permalink / raw)
To: Timofey Titovets; +Cc: linux-btrfs
On Mon, Jul 24, 2017 at 02:37:05PM +0300, Timofey Titovets wrote:
> Based on kdave for-next
> As heuristic skeleton already merged
> Populate heuristic with basic code that:
> 1. Collect sample from input data
> 2. Calculate byte set for sample
> For detect easily compressible data
> 3. Calculate byte core set size
> For detect easily and not compressible data
>
> Timofey Titovets (3):
> Btrfs: heuristic add simple sampling logic
> Btrfs: heuristic add byte set calculation
> Btrfs: heuristic add byte core set calculation
>
> fs/btrfs/compression.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++-
> fs/btrfs/compression.h | 14 +++++++
> 2 files changed, 123 insertions(+), 2 deletions(-)
>
Overall these look fine, I'd like more verbose changelogs to describe in more
detail the heuristic you are adding. I had to go back and work out what you
were doing a few times because it wasn't clear what each patch was doing.
Thanks,
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-07-24 15:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-24 11:37 [PATCH 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
2017-07-24 11:37 ` [PATCH 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
2017-07-24 14:55 ` Josef Bacik
2017-07-24 11:37 ` [PATCH 2/3] Btrfs: heuristic add byte set calculation Timofey Titovets
2017-07-24 11:37 ` [PATCH 3/3] Btrfs: heuristic add byte core " Timofey Titovets
2017-07-24 15:02 ` [PATCH 0/3] Btrfs: populate heuristic with detection logic Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).