From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Hamish Moffatt <hamish-btrfs@moffatt.email>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: new database files not compressed
Date: Sat, 5 Sep 2020 00:07:33 -0400 [thread overview]
Message-ID: <20200905040733.GD5890@hungrycats.org> (raw)
In-Reply-To: <0c74cc4a-3644-805c-9501-6888c2a03f24@moffatt.email>
On Fri, Sep 04, 2020 at 06:07:32PM +1000, Hamish Moffatt wrote:
> On 4/9/20 5:44 am, Zygo Blaxell wrote:
> > On Thu, Sep 03, 2020 at 10:53:23PM +1000, Hamish Moffatt wrote:
> >
> > > I recompiled Firebird with fallocate disabled (it has a fallback for
> > > non-linux OSs), and now I have compressed database files.
> > >
> > > It may be that de-duplication suits my application better anyway. Will
> > > compsize tell me how much space is being saved by de-duplication, or is
> > > there another way to find out?
> > Compsize reports "Uncompressed" and "Referenced" columns. "Uncompressed"
> > is the physical size of the uncompressed data (i.e. how many bytes
> > you would need to hold all of the extents on disk without compression
> > but with dedupe). "Referenced" is the logical size of the data, after
> > counting each reference (i.e. how many bytes you would need to hold all
> > of the data without compression or dedupe).
> >
> > The "none" and "zstd" rows will tell you how much dedupe you're getting
> > on uncompressed and compressed extents separately.
>
>
> Great, I have it bees running and I see the deduplication in compsize as you
> said.
>
> What is the appropriate place to ask question about bees - here, github or
> elsewhere?
I'm in all three places, though I might miss it if it's only posted to
the linux-btrfs list. If you need to send a log file and you don't want it
to be fully public, there's an email address in the bees README.
> I added some files, restarted bees and it ran a deduplication, but then I
> added some more files (8 hours ago) and there's been some regularly logging
> but the new files haven't been deduplicated.
Information we'd need for this:
- kernel version you're running
- bees log, preferably at a level high enough to see the individual
dedupe ops
- btrfs dump-tree of the subvol trees containing the files
or a small reproducer.
If there's something like this in the log:
2020-09-05 04:03:13 4464.4468<5> crawl_5: WORKAROUND: toxic address: addr = 0x544eb000, sys_usage_delta = 0.136, user_usage_delta = 0.02, rt_age = 0.251574, refs 1
2020-09-05 04:03:13 4464.4468<4> crawl_5: WORKAROUND: discovered toxic match at found_addr 0x544eb000 matching bbd BeesBlockData { 4K 0x1db000 fd = 9 '/try/31264-2', address = 0x54823000, hash = 0x9a2052ab5fe19ae, data[4096] }
then it's bees detecting btrfs taking too long to do an extent lookup, and
blacklisting the extent to avoid crippling performance problems. On 5.7
it seems to be happening a lot...ironic, given that 5.7 has much faster
backref code.
> Hamish
>
>
next prev parent reply other threads:[~2020-09-05 4:07 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-30 9:35 new database files not compressed Hamish Moffatt
2020-08-31 2:20 ` Eric Wong
2020-08-31 2:44 ` Hamish Moffatt
2020-08-31 3:15 ` A L
2020-08-31 3:47 ` Zygo Blaxell
2020-08-31 8:53 ` Hamish Moffatt
2020-08-31 9:25 ` Nikolay Borisov
2020-08-31 10:40 ` Hamish Moffatt
2020-08-31 10:47 ` Nikolay Borisov
2020-08-31 12:56 ` Hamish Moffatt
2020-08-31 11:15 ` Roman Mamedov
2020-08-31 12:54 ` Hamish Moffatt
2020-08-31 12:57 ` Nikolay Borisov
2020-08-31 23:50 ` Hamish Moffatt
2020-09-01 5:15 ` Nikolay Borisov
2020-09-01 8:55 ` Hamish Moffatt
2020-09-02 0:32 ` Hamish Moffatt
2020-09-02 5:57 ` Nikolay Borisov
2020-09-02 6:05 ` Hamish Moffatt
2020-09-02 6:10 ` Nikolay Borisov
2020-09-02 9:57 ` A L
2020-09-02 10:09 ` Nikolay Borisov
2020-09-03 15:04 ` A L
2020-09-02 16:16 ` Zygo Blaxell
2020-09-03 12:53 ` Hamish Moffatt
2020-09-03 19:44 ` Zygo Blaxell
2020-09-04 8:07 ` Hamish Moffatt
2020-09-05 4:07 ` Zygo Blaxell [this message]
2020-09-03 15:03 ` A L
2020-09-03 21:52 ` Zygo Blaxell
2020-09-01 1:43 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200905040733.GD5890@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=hamish-btrfs@moffatt.email \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.