From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: kreijack@inwind.it
Cc: Chris Murphy <lists@colorremedies.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: is BTRFS_IOC_DEFRAG behavior optimal?
Date: Wed, 10 Feb 2021 22:13:06 -0500 [thread overview]
Message-ID: <20210211031306.GL28049@hungrycats.org> (raw)
In-Reply-To: <20210211030836.GE32440@hungrycats.org>
[-- Attachment #1: Type: text/plain, Size: 4866 bytes --]
Sorry, I busted my mail client. That was from me. :-P
On Wed, Feb 10, 2021 at 10:08:37PM -0500, kreijack@inwind.it wrote:
> On Wed, Feb 10, 2021 at 08:14:09PM +0100, Goffredo Baroncelli wrote:
> > Hi Chris,
> >
> > it seems that systemd-journald is more smart/complex than I thought:
> >
> > 1) systemd-journald set the "live" journal as NOCOW; *when* (see below) it
> > closes the files, it mark again these as COW then defrag [1]
> >
> > 2) looking at the code, I suspect that systemd-journald closes the
> > file asynchronously [2]. This means that looking at the "live" journal
> > is not sufficient. In fact:
> >
> > /var/log/journal/e84907d099904117b355a99c98378dca$ sudo lsattr $(ls -rt *)
> > [...]
> > --------------------- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd4f-0005baed61106a18.journal
> > --------------------- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000bd64-0005baed659feff4.journal
> > --------------------- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd67-0005baed65a0901f.journal
> > ---------------C----- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cc63-0005bafed4f12f0a.journal
> > ---------------C----- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cc85-0005baff0ce27e49.journal
> > ---------------C----- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cd38-0005baffe9080b4d.journal
> > ---------------C----- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cd3b-0005baffe908f244.journal
> > ---------------C----- user-1000.journal
> > ---------------C----- system.journal
> >
> > The output above means that the last 6 files are "pending" for a de-fragmentation. When these will be
> > "closed", the NOCOW flag will be removed and a defragmentation will start.
>
> Wait what?
>
> > Now my journals have few (2 or 3 extents). But I saw cases where the extents
> > of the more recent files are hundreds, but after few "journalct --rotate" the older files become less
> > fragmented.
> >
> > [1] https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L383
>
> That line doesn't work, and systemd ignores the error.
>
> The NOCOW flag cannot be set or cleared unless the file is empty.
> This is checked in btrfs_ioctl_setflags.
>
> This is not something that can be changed easily--if the NOCOW bit is
> cleared on a non-empty file, btrfs data read code will expect csums
> that aren't present on disk because they were written while the file was
> NODATASUM, and the reads will fail pretty badly. The entire file would
> have to have csums added or removed at the same time as the flag change
> (or all nodatacow file reads take a performance hit looking for csums
> that may or may not be present).
>
> At file close, the systemd should copy the data to a new file with no
> special attributes and discard or recycle the old inode. This copy
> will be mostly contiguous and have desirable properties like csums and
> compression, and will have iops equivalent to btrfs fi defrag.
>
> > [2] https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L3687
> >
> > On 2/10/21 7:37 AM, Chris Murphy wrote:
> > > This is an active (but idle) system.journal file. That is, it's open
> > > but not being written to. I did a sync right before this:
> > >
> > > https://pastebin.com/jHh5tfpe
> > >
> > > And then: btrfs fi defrag -l 8M system.journal
> > >
> > > https://pastebin.com/Kq1GjJuh
> > >
> > > Looks like most of it was a no op. So it seems btrfs in this case is
> > > not confused by so many small extent items, it know they are
> > > contiguous?
> > >
> > > It doesn't answer the question what the "too small" threshold is for
> > > BTRFS_IOC_DEFRAG, which is what sd-journald is using, though.
> > >
> > > Another sync, and then, 'journalctl --rotate' and the resulting
> > > archived file is now:
> > >
> > > https://pastebin.com/aqac0dRj
> > >
> > > These are not the same results between the two ioctls for the same
> > > file, and not the same result as what you get with -l 32M (which I do
> > > get if I use the default 32M). The BTRFS_IOC_DEFRAG interleaved result
> > > is peculiar, but I don't think we can say it's ineffective, it might
> > > be an intentional no op either because it's nodatacow or it sees that
> > > these many extents are mostly contiguous and not worth defragmenting
> > > (which would be good for keeping write amplification down).
> > >
> > > So I don't know, maybe it's not wrong.
> > >
> > > --
> > > Chris Murphy
> > >
> >
> >
> > --
> > gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> > Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
next prev parent reply other threads:[~2021-02-11 3:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-07 22:06 is BTRFS_IOC_DEFRAG behavior optimal? Chris Murphy
2021-02-08 22:11 ` Goffredo Baroncelli
2021-02-08 22:21 ` Zygo Blaxell
2021-02-09 1:05 ` Chris Murphy
2021-02-09 0:42 ` Chris Murphy
2021-02-09 18:13 ` Goffredo Baroncelli
2021-02-09 19:01 ` Chris Murphy
2021-02-09 19:45 ` Goffredo Baroncelli
2021-02-09 20:26 ` Chris Murphy
2021-02-10 6:37 ` Chris Murphy
2021-02-10 19:14 ` Goffredo Baroncelli
2021-02-11 0:19 ` Chris Murphy
2021-02-11 3:08 ` kreijack
2021-02-11 3:13 ` Zygo Blaxell [this message]
2021-02-11 3:39 ` Chris Murphy
2021-02-11 6:12 ` Zygo Blaxell
2021-02-11 8:46 ` Chris Murphy
2021-02-13 0:16 ` Zygo Blaxell
2021-02-11 3:52 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210211031306.GL28049@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).