linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Chris Murphy <lists@colorremedies.com>,
	Goffredo Baroncelli <kreijack@inwind.it>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: is BTRFS_IOC_DEFRAG behavior optimal?
Date: Thu, 11 Feb 2021 01:46:07 -0700	[thread overview]
Message-ID: <CAJCQCtQnvVUVhTPQ1v=mw=jDBbsHb5HAa5=s3E+AWuBD7pSO2g@mail.gmail.com> (raw)
In-Reply-To: <20210211061220.GF32440@hungrycats.org>

On Wed, Feb 10, 2021 at 11:12 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:


>
> If we want the data compressed (and who doesn't?  journal data compresses
> 8:1 with btrfs zstd) then we'll always need to make a copy at close.
> Because systemd used prealloc, the copy is necessarily to a new inode,
> as there's no way to re-enable compression on an inode once prealloc
> is used (this has deep disk-format reasons, but not as deep as the
> nodatacow ones).

Pretty sure sd-journald still fallocates when datacow by touching
/etc/tmpfiles.d/journal-nocow.conf

And I know for sure those datacow files do compress on rotation.

Preallocated datacow might not be so bad if it weren't for that one
damn header or indexing block, whatever the proper term is, that
sd-journald hammers every time it fsyncs. I don't know if I wanna know
what it means to snapshot a datacow file that's prealloc. But in
theory if the same blocks weren't all being hammered, a preallocated
file shouldn't fragment like hell if each prealloc block gets just one
write.


> If we don't care about compression or datasums, then keep the file
> nodatacow and do nothing at close.  The defrag isn't needed and the
> FS_NOCOW_FL flag change doesn't work.

Agreed.


> It makes sense for SSD too.  It's 4K extents, so the metadata and small-IO
> overheads will be non-trivial even on SSD.  Deleting or truncating datacow
> journal files will put a lot of tiny free space holes into the filesystem.
> It will flood the next commit with delayed refs and push up latency.

I haven't seen meaningful latency on a single journal file, datacow
and heavily fragmented, on ssd. But to test on more than one file at a
time i need to revert the defrag commits, and build systemd, and let a
bunch of journals accumulate somehow. If I dump too much data
artificially to try and mimic aging, I know I will get nowhere near as
many of those 4KiB extents. So I dunno.


>
> > In that case the fragmentation is
> > quite considerable, hundreds to thousands of extents. It's
> > sufficiently bad that it'd be probably be better if they were
> > defragmented automatically with a trigger that tests for number of
> > non-contiguous small blocks that somehow cheaply estimates latency
> > reading all of them.
>
> Yeah it would be nice of autodefrag could be made to not suck.

It triggers on inserts, not appends. So it doesn't do anything for the
sd-journald case.

I would think the active journals are the one more likely to get
searched for recent events than archived journals. So in the datacow
case, you only get relief once it's rotated. It'd be nice to find an
decent, not necessarily perfect, way for them to not get so fragmented
in the first place. Or just defrag once a file has 16M of
non-contiguous extents.

Estimating extents though is another issue, especially with compression enabled.

-- 
Chris Murphy

  reply	other threads:[~2021-02-11  8:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-07 22:06 is BTRFS_IOC_DEFRAG behavior optimal? Chris Murphy
2021-02-08 22:11 ` Goffredo Baroncelli
2021-02-08 22:21   ` Zygo Blaxell
2021-02-09  1:05     ` Chris Murphy
2021-02-09  0:42   ` Chris Murphy
2021-02-09 18:13     ` Goffredo Baroncelli
2021-02-09 19:01       ` Chris Murphy
2021-02-09 19:45         ` Goffredo Baroncelli
2021-02-09 20:26           ` Chris Murphy
2021-02-10  6:37             ` Chris Murphy
2021-02-10 19:14               ` Goffredo Baroncelli
2021-02-11  0:19                 ` Chris Murphy
2021-02-11  3:08                 ` kreijack
2021-02-11  3:13                   ` Zygo Blaxell
2021-02-11  3:39                     ` Chris Murphy
2021-02-11  6:12                       ` Zygo Blaxell
2021-02-11  8:46                         ` Chris Murphy [this message]
2021-02-13  0:16                           ` Zygo Blaxell
2021-02-11  3:52                     ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJCQCtQnvVUVhTPQ1v=mw=jDBbsHb5HAa5=s3E+AWuBD7pSO2g@mail.gmail.com' \
    --to=lists@colorremedies.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).