public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Hamish Moffatt <hamish-btrfs@moffatt.email>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: new database files not compressed
Date: Sat, 5 Sep 2020 00:07:33 -0400	[thread overview]
Message-ID: <20200905040733.GD5890@hungrycats.org> (raw)
In-Reply-To: <0c74cc4a-3644-805c-9501-6888c2a03f24@moffatt.email>

On Fri, Sep 04, 2020 at 06:07:32PM +1000, Hamish Moffatt wrote:
> On 4/9/20 5:44 am, Zygo Blaxell wrote:
> > On Thu, Sep 03, 2020 at 10:53:23PM +1000, Hamish Moffatt wrote:
> > 
> > > I recompiled Firebird with fallocate disabled (it has a fallback for
> > > non-linux OSs), and now I have compressed database files.
> > > 
> > > It may be that de-duplication suits my application better anyway. Will
> > > compsize tell me how much space is being saved by de-duplication, or is
> > > there another way to find out?
> > Compsize reports "Uncompressed" and "Referenced" columns.  "Uncompressed"
> > is the physical size of the uncompressed data (i.e. how many bytes
> > you would need to hold all of the extents on disk without compression
> > but with dedupe).  "Referenced" is the logical size of the data, after
> > counting each reference (i.e. how many bytes you would need to hold all
> > of the data without compression or dedupe).
> > 
> > The "none" and "zstd" rows will tell you how much dedupe you're getting
> > on uncompressed and compressed extents separately.
> 
> 
> Great, I have it bees running and I see the deduplication in compsize as you
> said.
> 
> What is the appropriate place to ask question about bees - here, github or
> elsewhere?

I'm in all three places, though I might miss it if it's only posted to
the linux-btrfs list.  If you need to send a log file and you don't want it
to be fully public, there's an email address in the bees README.

> I added some files, restarted bees and it ran a deduplication, but then I
> added some more files (8 hours ago) and there's been some regularly logging
> but the new files haven't been deduplicated.

Information we'd need for this:

	- kernel version you're running

	- bees log, preferably at a level high enough to see the individual
	dedupe ops

	- btrfs dump-tree of the subvol trees containing the files

or a small reproducer.

If there's something like this in the log:

	2020-09-05 04:03:13 4464.4468<5> crawl_5: WORKAROUND: toxic address: addr = 0x544eb000, sys_usage_delta = 0.136, user_usage_delta = 0.02, rt_age = 0.251574, refs 1
	2020-09-05 04:03:13 4464.4468<4> crawl_5: WORKAROUND: discovered toxic match at found_addr 0x544eb000 matching bbd BeesBlockData { 4K 0x1db000 fd = 9 '/try/31264-2', address = 0x54823000, hash = 0x9a2052ab5fe19ae, data[4096] }

then it's bees detecting btrfs taking too long to do an extent lookup, and
blacklisting the extent to avoid crippling performance problems.  On 5.7
it seems to be happening a lot...ironic, given that 5.7 has much faster
backref code.


> Hamish
> 
> 

  reply	other threads:[~2020-09-05  4:07 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-30  9:35 new database files not compressed Hamish Moffatt
2020-08-31  2:20 ` Eric Wong
2020-08-31  2:44   ` Hamish Moffatt
2020-08-31  3:15   ` A L
2020-08-31  3:47 ` Zygo Blaxell
2020-08-31  8:53   ` Hamish Moffatt
2020-08-31  9:25     ` Nikolay Borisov
2020-08-31 10:40       ` Hamish Moffatt
2020-08-31 10:47         ` Nikolay Borisov
2020-08-31 12:56           ` Hamish Moffatt
2020-08-31 11:15     ` Roman Mamedov
2020-08-31 12:54       ` Hamish Moffatt
2020-08-31 12:57         ` Nikolay Borisov
2020-08-31 23:50           ` Hamish Moffatt
2020-09-01  5:15             ` Nikolay Borisov
2020-09-01  8:55               ` Hamish Moffatt
2020-09-02  0:32                 ` Hamish Moffatt
2020-09-02  5:57                   ` Nikolay Borisov
2020-09-02  6:05                     ` Hamish Moffatt
2020-09-02  6:10                       ` Nikolay Borisov
2020-09-02  9:57                     ` A L
2020-09-02 10:09                       ` Nikolay Borisov
2020-09-03 15:04                         ` A L
2020-09-02 16:16                       ` Zygo Blaxell
2020-09-03 12:53                         ` Hamish Moffatt
2020-09-03 19:44                           ` Zygo Blaxell
2020-09-04  8:07                             ` Hamish Moffatt
2020-09-05  4:07                               ` Zygo Blaxell [this message]
2020-09-03 15:03                         ` A L
2020-09-03 21:52                           ` Zygo Blaxell
2020-09-01  1:43 ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200905040733.GD5890@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=hamish-btrfs@moffatt.email \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox