From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Hamish Moffatt <hamish-btrfs@moffatt.email>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: new database files not compressed
Date: Sat, 5 Sep 2020 00:07:33 -0400 [thread overview]
Message-ID: <20200905040733.GD5890@hungrycats.org> (raw)
In-Reply-To: <0c74cc4a-3644-805c-9501-6888c2a03f24@moffatt.email>
On Fri, Sep 04, 2020 at 06:07:32PM +1000, Hamish Moffatt wrote:
> On 4/9/20 5:44 am, Zygo Blaxell wrote:
> > On Thu, Sep 03, 2020 at 10:53:23PM +1000, Hamish Moffatt wrote:
> >
> > > I recompiled Firebird with fallocate disabled (it has a fallback for
> > > non-linux OSs), and now I have compressed database files.
> > >
> > > It may be that de-duplication suits my application better anyway. Will
> > > compsize tell me how much space is being saved by de-duplication, or is
> > > there another way to find out?
> > Compsize reports "Uncompressed" and "Referenced" columns. "Uncompressed"
> > is the physical size of the uncompressed data (i.e. how many bytes
> > you would need to hold all of the extents on disk without compression
> > but with dedupe). "Referenced" is the logical size of the data, after
> > counting each reference (i.e. how many bytes you would need to hold all
> > of the data without compression or dedupe).
> >
> > The "none" and "zstd" rows will tell you how much dedupe you're getting
> > on uncompressed and compressed extents separately.
>
>
> Great, I have it bees running and I see the deduplication in compsize as you
> said.
>
> What is the appropriate place to ask question about bees - here, github or
> elsewhere?
I'm in all three places, though I might miss it if it's only posted to
the linux-btrfs list. If you need to send a log file and you don't want it
to be fully public, there's an email address in the bees README.
> I added some files, restarted bees and it ran a deduplication, but then I
> added some more files (8 hours ago) and there's been some regularly logging
> but the new files haven't been deduplicated.
Information we'd need for this:
- kernel version you're running
- bees log, preferably at a level high enough to see the individual
dedupe ops
- btrfs dump-tree of the subvol trees containing the files
or a small reproducer.
If there's something like this in the log:
2020-09-05 04:03:13 4464.4468<5> crawl_5: WORKAROUND: toxic address: addr = 0x544eb000, sys_usage_delta = 0.136, user_usage_delta = 0.02, rt_age = 0.251574, refs 1
2020-09-05 04:03:13 4464.4468<4> crawl_5: WORKAROUND: discovered toxic match at found_addr 0x544eb000 matching bbd BeesBlockData { 4K 0x1db000 fd = 9 '/try/31264-2', address = 0x54823000, hash = 0x9a2052ab5fe19ae, data[4096] }
then it's bees detecting btrfs taking too long to do an extent lookup, and
blacklisting the extent to avoid crippling performance problems. On 5.7
it seems to be happening a lot...ironic, given that 5.7 has much faster
backref code.
> Hamish
>
>
next prev parent reply other threads:[~2020-09-05 4:07 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-30 9:35 new database files not compressed Hamish Moffatt
2020-08-31 2:20 ` Eric Wong
2020-08-31 2:44 ` Hamish Moffatt
2020-08-31 3:15 ` A L
2020-08-31 3:47 ` Zygo Blaxell
2020-08-31 8:53 ` Hamish Moffatt
2020-08-31 9:25 ` Nikolay Borisov
2020-08-31 10:40 ` Hamish Moffatt
2020-08-31 10:47 ` Nikolay Borisov
2020-08-31 12:56 ` Hamish Moffatt
2020-08-31 11:15 ` Roman Mamedov
2020-08-31 12:54 ` Hamish Moffatt
2020-08-31 12:57 ` Nikolay Borisov
2020-08-31 23:50 ` Hamish Moffatt
2020-09-01 5:15 ` Nikolay Borisov
2020-09-01 8:55 ` Hamish Moffatt
2020-09-02 0:32 ` Hamish Moffatt
2020-09-02 5:57 ` Nikolay Borisov
2020-09-02 6:05 ` Hamish Moffatt
2020-09-02 6:10 ` Nikolay Borisov
2020-09-02 9:57 ` A L
2020-09-02 10:09 ` Nikolay Borisov
2020-09-03 15:04 ` A L
2020-09-02 16:16 ` Zygo Blaxell
2020-09-03 12:53 ` Hamish Moffatt
2020-09-03 19:44 ` Zygo Blaxell
2020-09-04 8:07 ` Hamish Moffatt
2020-09-05 4:07 ` Zygo Blaxell [this message]
2020-09-03 15:03 ` A L
2020-09-03 21:52 ` Zygo Blaxell
2020-09-01 1:43 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200905040733.GD5890@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=hamish-btrfs@moffatt.email \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox