linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Timofey Titovets <nefelim4ag@gmail.com>
To: linux-btrfs@vger.kernel.org
Cc: Timofey Titovets <nefelim4ag@gmail.com>
Subject: [PATCH v7 0/6] Btrfs: populate heuristic with code
Date: Fri, 25 Aug 2017 12:18:39 +0300	[thread overview]
Message-ID: <20170825091845.4120-1-nefelim4ag@gmail.com> (raw)

Based on kdave for-next

Patches short:
1. Move heuristic to use compression workspaces
   Bit tricky, but works.

2. Add heuristic counters and buffer to workspaces

3. Implement simple input data sampling
   It's get 16 byte samples with 256 bytes shifts
   over input data. Collect info about how many
   different bytes (symbols) has been found in sample data

4. Implement check sample to repeated data
   Just iterate over sample and do memcmp()

5. Add code for calculate
   how many unique bytes has been found in sample data
   That can fast detect easy compressible data

6. Add code for calculate byte core set size
   i.e. how many unique bytes use 90% of sample data
   That code require that numbers in bucket must be sorted
   That can detect easy compressible data with many repeated bytes
   That can detect not compressible data with evenly distributed bytes

Changes v1 -> v2:
  - Change input data iterator shift 512 -> 256
  - Replace magic macro numbers with direct values
  - Drop useless symbol population in bucket
    as no one care about where and what symbol stored
    in bucket at now

Changes v2 -> v3 (only update #3 patch):
  - Fix u64 division problem by use u32 for input_size
  - Fix input size calculation start - end -> end - start
  - Add missing sort.h header

Changes v3 -> v4 (only update #1 patch):
  - Change counter type in bucket item u16 -> u32
  - Drop other fields from bucket item for now,
    no one use it

Change v4 -> v5
  - Move heuristic code to external file
  - Make heuristic use compression workspaces
  - Add check sample to zeroes

Change v5 -> v6
  - Add some code to hande page unaligned range start/end
  - replace sample zeroed check with check for repeated data

Change v6 -> v7
  - Add missing part of first patch
  - Make use of IS_ALIGNED() for check tail aligment

Timofey Titovets (6):
  Btrfs: heuristic make use compression workspaces
  Btrfs: heuristic workspace add bucket and sample items
  Btrfs: implement heuristic sampling logic
  Btrfs: heuristic add detection of repeated data patterns
  Btrfs: heuristic add byte set calculation
  Btrfs: heuristic add byte core set calculation

 fs/btrfs/Makefile      |   2 +-
 fs/btrfs/compression.c |  18 ++--
 fs/btrfs/compression.h |   7 +-
 fs/btrfs/heuristic.c   | 223 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 237 insertions(+), 13 deletions(-)
 create mode 100644 fs/btrfs/heuristic.c

--
2.14.1

             reply	other threads:[~2017-08-25  9:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25  9:18 Timofey Titovets [this message]
2017-08-25  9:18 ` [PATCH v7 1/6] Btrfs: heuristic make use compression workspaces Timofey Titovets
2017-09-27 13:12   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 2/6] Btrfs: heuristic workspace add bucket and sample items Timofey Titovets
2017-09-27 13:22   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 3/6] Btrfs: implement heuristic sampling logic Timofey Titovets
2017-09-27 13:38   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 4/6] Btrfs: heuristic add detection of repeated data patterns Timofey Titovets
2017-09-27 13:47   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 5/6] Btrfs: heuristic add byte set calculation Timofey Titovets
2017-09-27 13:50   ` David Sterba
2017-08-25  9:18 ` [PATCH v7 6/6] Btrfs: heuristic add byte core " Timofey Titovets
2017-09-27 13:54   ` David Sterba
2017-09-27 13:56   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170825091845.4120-1-nefelim4ag@gmail.com \
    --to=nefelim4ag@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).