linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Btrfs: populate heuristic with detection logic
@ 2017-07-26 12:49 Timofey Titovets
  2017-07-26 12:49 ` [PATCH v2 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Timofey Titovets @ 2017-07-26 12:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Based on kdave for-next
As heuristic skeleton already merged
Populate heuristic with basic code.

First patch: add simple sampling code
It's get 16 byte samples with 256 bytes shifts
over input data. Collect info about how many
different bytes (symbols) has been found in sample data

Second patch: add code for calculate
how many unique bytes has been
found in sample data
That can fast detect easy compressible data

Third patch: add code for calculate byte core set size
i.e. how many unique bytes use 90% of sample data
That code require that numbers in bucket must be sorted
That can detect easy compressible data with many repeated bytes
That can detect not compressible data with evenly distributed bytes

Changes v1 -> v2:
  - Change input data iterator shift 512 -> 256
  - Replace magic macro numbers with direct values
  - Drop useless symbol population in bucket
    as no one care about where and what symbol stored
    in bucket at now

Timofey Titovets (3):
  Btrfs: heuristic add simple sampling logic
  Btrfs: heuristic add byte set calculation
  Btrfs: heuristic add byte core set calculation

 fs/btrfs/compression.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/compression.h |  13 ++++++
 2 files changed, 119 insertions(+), 2 deletions(-)

--
2.13.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] Btrfs: heuristic add simple sampling logic
  2017-07-26 12:49 [PATCH v2 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
@ 2017-07-26 12:49 ` Timofey Titovets
  2017-07-26 12:49 ` [PATCH v2 2/3] Btrfs: heuristic add byte set calculation Timofey Titovets
  2017-07-26 12:49 ` [PATCH v2 3/3] Btrfs: heuristic add byte core " Timofey Titovets
  2 siblings, 0 replies; 7+ messages in thread
From: Timofey Titovets @ 2017-07-26 12:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Get small sample from input data and calculate
byte type count for that sample into bucket.
Bucket will store info about which bytes
and how many has been detected in sample

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 24 ++++++++++++++++++++++--
 fs/btrfs/compression.h | 10 ++++++++++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 63f54bd2d5bb..ca7cfaad6e2f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1068,15 +1068,35 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 	u64 index = start >> PAGE_SHIFT;
 	u64 end_index = end >> PAGE_SHIFT;
 	struct page *page;
-	int ret = 1;
+	struct heuristic_bucket_item *bucket;
+	int a, b, ret;
+	u8 symbol, *input_data;
+
+	ret = 1;
+
+	bucket = kcalloc(BTRFS_HEURISTIC_BUCKET_SIZE,
+		sizeof(struct heuristic_bucket_item), GFP_NOFS);
+
+	if (!bucket)
+		goto out;

 	while (index <= end_index) {
 		page = find_get_page(inode->i_mapping, index);
-		kmap(page);
+		input_data = kmap(page);
+		a = 0;
+		while (a < PAGE_SIZE) {
+			for (b = 0; b < BTRFS_HEURISTIC_READ_SIZE; b++) {
+				symbol = input_data[a+b];
+				bucket[symbol].count++;
+			}
+			a += BTRFS_HEURISTIC_ITER_OFFSET;
+		}
 		kunmap(page);
 		put_page(page);
 		index++;
 	}

+out:
+	kfree(bucket);
 	return ret;
 }
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index d1f4eee2d0af..e30a9df1937e 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -129,6 +129,16 @@ struct btrfs_compress_op {
 extern const struct btrfs_compress_op btrfs_zlib_compress;
 extern const struct btrfs_compress_op btrfs_lzo_compress;

+struct heuristic_bucket_item {
+       u8  padding;
+       u8  symbol;
+       u16 count;
+};
+
+#define BTRFS_HEURISTIC_READ_SIZE 16
+#define BTRFS_HEURISTIC_ITER_OFFSET 256
+#define BTRFS_HEURISTIC_BUCKET_SIZE 256
+
 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);

 #endif
--
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] Btrfs: heuristic add byte set calculation
  2017-07-26 12:49 [PATCH v2 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
  2017-07-26 12:49 ` [PATCH v2 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
@ 2017-07-26 12:49 ` Timofey Titovets
  2017-07-26 12:49 ` [PATCH v2 3/3] Btrfs: heuristic add byte core " Timofey Titovets
  2 siblings, 0 replies; 7+ messages in thread
From: Timofey Titovets @ 2017-07-26 12:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte set size for data sample:
Calculate how many unique bytes has been in sample
By count all bytes in bucket with count > 0
If byte set low (~25%), data are easily compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 27 +++++++++++++++++++++++++++
 fs/btrfs/compression.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ca7cfaad6e2f..1429b11f2c5f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1048,6 +1048,27 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
 	return 1;
 }

+static inline int byte_set_size(const struct heuristic_bucket_item *bucket)
+{
+	int a = 0;
+	int byte_set_size = 0;
+
+	for (; a < BTRFS_HEURISTIC_BYTE_SET_THRESHOLD; a++) {
+		if (bucket[a].count > 0)
+			byte_set_size++;
+	}
+
+	for (; a < BTRFS_HEURISTIC_BUCKET_SIZE; a++) {
+		if (bucket[a].count > 0) {
+			byte_set_size++;
+			if (byte_set_size > BTRFS_HEURISTIC_BYTE_SET_THRESHOLD)
+				return byte_set_size;
+		}
+	}
+
+	return byte_set_size;
+}
+
 /*
  * Compression heuristic.
  *
@@ -1096,6 +1117,12 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 		index++;
 	}

+	a = byte_set_size(bucket);
+	if (a > BTRFS_HEURISTIC_BYTE_SET_THRESHOLD) {
+		ret = 1;
+		goto out;
+	}
+
 out:
 	kfree(bucket);
 	return ret;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index e30a9df1937e..03857967815a 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -138,6 +138,7 @@ struct heuristic_bucket_item {
 #define BTRFS_HEURISTIC_READ_SIZE 16
 #define BTRFS_HEURISTIC_ITER_OFFSET 256
 #define BTRFS_HEURISTIC_BUCKET_SIZE 256
+#define BTRFS_HEURISTIC_BYTE_SET_THRESHOLD 64

 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);

--
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] Btrfs: heuristic add byte core set calculation
  2017-07-26 12:49 [PATCH v2 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
  2017-07-26 12:49 ` [PATCH v2 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
  2017-07-26 12:49 ` [PATCH v2 2/3] Btrfs: heuristic add byte set calculation Timofey Titovets
@ 2017-07-26 12:49 ` Timofey Titovets
  2017-07-29  5:52   ` kbuild test robot
  2017-07-29 11:43   ` kbuild test robot
  2 siblings, 2 replies; 7+ messages in thread
From: Timofey Titovets @ 2017-07-26 12:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Timofey Titovets

Calculate byte core set for data sample:
Sort bucket's numbers in decreasing order
Count how many numbers use 90% of sample
If core set are low (<=25%), data are easily compressible
If core set high (>=80%), data are not compressible

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/compression.h |  2 ++
 2 files changed, 59 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1429b11f2c5f..314cbdd8d175 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1069,6 +1069,42 @@ static inline int byte_set_size(const struct heuristic_bucket_item *bucket)
 	return byte_set_size;
 }

+/* For bucket sorting */
+static inline int heuristic_bucket_compare(const void *lv, const void *rv)
+{
+	struct heuristic_bucket_item *l = (struct heuristic_bucket_item *)(lv);
+	struct heuristic_bucket_item *r = (struct heuristic_bucket_item *)(rv);
+
+	return r->count - l->count;
+}
+
+/*
+ * Byte Core set size
+ * How many bytes use 90% of sample
+ */
+static inline int byte_core_set_size(struct heuristic_bucket_item *bucket,
+				     u32 core_set_threshold)
+{
+	int a = 0;
+	u32 coreset_sum = 0;
+
+	for (; a < BTRFS_HEURISTIC_BYTE_CORE_SET_LOW; a++)
+		coreset_sum += bucket[a].count;
+
+	if (coreset_sum > core_set_threshold)
+		return a;
+
+	for (; a < BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH; a++) {
+		if (bucket[a].count == 0)
+			break;
+		coreset_sum += bucket[a].count;
+		if (coreset_sum > core_set_threshold)
+			break;
+	}
+
+	return a;
+}
+
 /*
  * Compression heuristic.
  *
@@ -1092,6 +1128,8 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 	struct heuristic_bucket_item *bucket;
 	int a, b, ret;
 	u8 symbol, *input_data;
+	u32 core_set_threshold;
+	u64 input_size = start - end;

 	ret = 1;

@@ -1123,6 +1161,25 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
 		goto out;
 	}

+	/* Sort in reverse order */
+	sort(bucket, BTRFS_HEURISTIC_BUCKET_SIZE,
+	     sizeof(struct heuristic_bucket_item), &heuristic_bucket_compare,
+	     NULL);
+
+	core_set_threshold = input_size*90/BTRFS_HEURISTIC_ITER_OFFSET/100;
+	core_set_threshold *= BTRFS_HEURISTIC_READ_SIZE;
+
+	a = byte_core_set_size(bucket, core_set_threshold);
+	if (a <= BTRFS_HEURISTIC_BYTE_CORE_SET_LOW) {
+		ret = 2;
+		goto out;
+	}
+
+	if (a >= BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH) {
+		ret = 0;
+		goto out;
+	}
+
 out:
 	kfree(bucket);
 	return ret;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 03857967815a..0fcd1a485adb 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -139,6 +139,8 @@ struct heuristic_bucket_item {
 #define BTRFS_HEURISTIC_ITER_OFFSET 256
 #define BTRFS_HEURISTIC_BUCKET_SIZE 256
 #define BTRFS_HEURISTIC_BYTE_SET_THRESHOLD 64
+#define BTRFS_HEURISTIC_BYTE_CORE_SET_LOW  BTRFS_HEURISTIC_BYTE_SET_THRESHOLD
+#define BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH 200 // 80%

 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);

--
2.13.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] Btrfs: heuristic add byte core set calculation
  2017-07-26 12:49 ` [PATCH v2 3/3] Btrfs: heuristic add byte core " Timofey Titovets
@ 2017-07-29  5:52   ` kbuild test robot
  2017-07-29 11:43   ` kbuild test robot
  1 sibling, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2017-07-29  5:52 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: kbuild-all, linux-btrfs, Timofey Titovets

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

Hi Timofey,

[auto build test ERROR on next-20170724]
[cannot apply to btrfs/next v4.13-rc2 v4.13-rc1 v4.12 v4.13-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Timofey-Titovets/Btrfs-populate-heuristic-with-detection-logic/20170729-061208
config: i386-randconfig-n0-201730 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/btrfs/compression.o: In function `btrfs_compress_heuristic':
>> compression.c:(.text+0x2208): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30501 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] Btrfs: heuristic add byte core set calculation
  2017-07-26 12:49 ` [PATCH v2 3/3] Btrfs: heuristic add byte core " Timofey Titovets
  2017-07-29  5:52   ` kbuild test robot
@ 2017-07-29 11:43   ` kbuild test robot
  2017-07-29 13:30     ` Timofey Titovets
  1 sibling, 1 reply; 7+ messages in thread
From: kbuild test robot @ 2017-07-29 11:43 UTC (permalink / raw)
  To: Timofey Titovets; +Cc: kbuild-all, linux-btrfs, Timofey Titovets

[-- Attachment #1: Type: text/plain, Size: 947 bytes --]

Hi Timofey,

[auto build test ERROR on next-20170724]
[cannot apply to btrfs/next v4.13-rc2 v4.13-rc1 v4.12 v4.13-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Timofey-Titovets/Btrfs-populate-heuristic-with-detection-logic/20170729-061208
config: arm-arm5 (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

>> ERROR: "__aeabi_uldivmod" [fs/btrfs/btrfs.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28714 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] Btrfs: heuristic add byte core set calculation
  2017-07-29 11:43   ` kbuild test robot
@ 2017-07-29 13:30     ` Timofey Titovets
  0 siblings, 0 replies; 7+ messages in thread
From: Timofey Titovets @ 2017-07-29 13:30 UTC (permalink / raw)
  To: kbuild test robot; +Cc: kbuild-all, linux-btrfs

2017-07-29 14:43 GMT+03:00 kbuild test robot <lkp@intel.com>:
> Hi Timofey,
>
> [auto build test ERROR on next-20170724]
> [cannot apply to btrfs/next v4.13-rc2 v4.13-rc1 v4.12 v4.13-rc2]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Timofey-Titovets/Btrfs-populate-heuristic-with-detection-logic/20170729-061208
> config: arm-arm5 (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=arm
>
> All errors (new ones prefixed by >>):
>
>>> ERROR: "__aeabi_uldivmod" [fs/btrfs/btrfs.ko] undefined!
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

I will fix 64bit division and resend patch set

Thanks.

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-29 13:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-26 12:49 [PATCH v2 0/3] Btrfs: populate heuristic with detection logic Timofey Titovets
2017-07-26 12:49 ` [PATCH v2 1/3] Btrfs: heuristic add simple sampling logic Timofey Titovets
2017-07-26 12:49 ` [PATCH v2 2/3] Btrfs: heuristic add byte set calculation Timofey Titovets
2017-07-26 12:49 ` [PATCH v2 3/3] Btrfs: heuristic add byte core " Timofey Titovets
2017-07-29  5:52   ` kbuild test robot
2017-07-29 11:43   ` kbuild test robot
2017-07-29 13:30     ` Timofey Titovets

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).