All of lore.kernel.org
 help / color / mirror / Atom feed
From: kreijack@inwind.it
To: Goffredo Baroncelli <kreijack@inwind.it>
Cc: Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: is BTRFS_IOC_DEFRAG behavior optimal?
Date: Wed, 10 Feb 2021 22:08:37 -0500	[thread overview]
Message-ID: <20210211030836.GE32440@hungrycats.org> (raw)
In-Reply-To: <4b01d738-5930-1100-03a4-6f1b7af445e5@inwind.it>

On Wed, Feb 10, 2021 at 08:14:09PM +0100, Goffredo Baroncelli wrote:
> Hi Chris,
> 
> it seems that systemd-journald is more smart/complex than I thought:
> 
> 1) systemd-journald set the "live" journal as NOCOW; *when* (see below) it
> closes the files, it mark again these as COW then defrag [1]
> 
> 2) looking at the code, I suspect that systemd-journald closes the
> file asynchronously [2]. This means that looking at the "live" journal
> is not sufficient. In fact:
> 
> /var/log/journal/e84907d099904117b355a99c98378dca$ sudo lsattr $(ls -rt *)
> [...]
> --------------------- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd4f-0005baed61106a18.journal
> --------------------- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000bd64-0005baed659feff4.journal
> --------------------- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd67-0005baed65a0901f.journal
> ---------------C----- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cc63-0005bafed4f12f0a.journal
> ---------------C----- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cc85-0005baff0ce27e49.journal
> ---------------C----- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cd38-0005baffe9080b4d.journal
> ---------------C----- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cd3b-0005baffe908f244.journal
> ---------------C----- user-1000.journal
> ---------------C----- system.journal
> 
> The output above means that the last 6 files are "pending" for a de-fragmentation. When these will be
> "closed", the NOCOW flag will be removed and a defragmentation will start.

Wait what?

> Now my journals have few (2 or 3 extents). But I saw cases where the extents
> of the more recent files are hundreds, but after few "journalct --rotate" the older files become less
> fragmented.
> 
> [1] https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L383

That line doesn't work, and systemd ignores the error.

The NOCOW flag cannot be set or cleared unless the file is empty.
This is checked in btrfs_ioctl_setflags.

This is not something that can be changed easily--if the NOCOW bit is
cleared on a non-empty file, btrfs data read code will expect csums
that aren't present on disk because they were written while the file was
NODATASUM, and the reads will fail pretty badly.  The entire file would
have to have csums added or removed at the same time as the flag change
(or all nodatacow file reads take a performance hit looking for csums
that may or may not be present).

At file close, the systemd should copy the data to a new file with no
special attributes and discard or recycle the old inode.  This copy
will be mostly contiguous and have desirable properties like csums and
compression, and will have iops equivalent to btrfs fi defrag.

> [2] https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L3687
> 
> On 2/10/21 7:37 AM, Chris Murphy wrote:
> > This is an active (but idle) system.journal file. That is, it's open
> > but not being written to. I did a sync right before this:
> > 
> > https://pastebin.com/jHh5tfpe
> > 
> > And then: btrfs fi defrag -l 8M system.journal
> > 
> > https://pastebin.com/Kq1GjJuh
> > 
> > Looks like most of it was a no op. So it seems btrfs in this case is
> > not confused by so many small extent items, it know they are
> > contiguous?
> > 
> > It doesn't answer the question what the "too small" threshold is for
> > BTRFS_IOC_DEFRAG, which is what sd-journald is using, though.
> > 
> > Another sync, and then, 'journalctl --rotate' and the resulting
> > archived file is now:
> > 
> > https://pastebin.com/aqac0dRj
> > 
> > These are not the same results between the two ioctls for the same
> > file, and not the same result as what you get with -l 32M (which I do
> > get if I use the default 32M). The BTRFS_IOC_DEFRAG interleaved result
> > is peculiar, but I don't think we can say it's ineffective, it might
> > be an intentional no op either because it's nodatacow or it sees that
> > these many extents are mostly contiguous and not worth defragmenting
> > (which would be good for keeping write amplification down).
> > 
> > So I don't know, maybe it's not wrong.
> > 
> > --
> > Chris Murphy
> > 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  parent reply	other threads:[~2021-02-11  3:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-07 22:06 is BTRFS_IOC_DEFRAG behavior optimal? Chris Murphy
2021-02-08 22:11 ` Goffredo Baroncelli
2021-02-08 22:21   ` Zygo Blaxell
2021-02-09  1:05     ` Chris Murphy
2021-02-09  0:42   ` Chris Murphy
2021-02-09 18:13     ` Goffredo Baroncelli
2021-02-09 19:01       ` Chris Murphy
2021-02-09 19:45         ` Goffredo Baroncelli
2021-02-09 20:26           ` Chris Murphy
2021-02-10  6:37             ` Chris Murphy
2021-02-10 19:14               ` Goffredo Baroncelli
2021-02-11  0:19                 ` Chris Murphy
2021-02-11  3:08                 ` kreijack [this message]
2021-02-11  3:13                   ` Zygo Blaxell
2021-02-11  3:39                     ` Chris Murphy
2021-02-11  6:12                       ` Zygo Blaxell
2021-02-11  8:46                         ` Chris Murphy
2021-02-13  0:16                           ` Zygo Blaxell
2021-02-11  3:52                     ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210211030836.GE32440@hungrycats.org \
    --to=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.