From: George Mitchell <george@chinilu.com>
To: Duncan <1i5t5.duncan@cox.net>, linux-btrfs@vger.kernel.org
Subject: Re: Possible application issue ...
Date: Mon, 07 Apr 2014 07:14:00 -0700 [thread overview]
Message-ID: <5342B2A8.7090705@chinilu.com> (raw)
In-Reply-To: <pan$e4d1$3c986a6f$c6e086ab$b682327c@cox.net>
On 04/07/2014 05:42 AM, Duncan wrote:
> George Mitchell posted on Sun, 06 Apr 2014 22:25:03 -0700 as excerpted:
>
>> I am seeming to have an issue with a specific application. I just
>> installed "Recoll", a really nice desktop search tool. And the
>> following day whenever my backup program would attempt to run, my
>> computer simply stopped dead in its tracks and I was forced to do a
>> hard reboot to get it back. So tonight I have been trying to shag out
>> the problem. And the problem goes like this. Whenever I try to
>> defrag the Recoll data files, I get a string of weird messages poring
>> out from the btrfs defrag program itself and flashing messages on the
>> screen regarding some sort of CPU failure problem for both cpus. As
>> soon as I removed the ".recoll" data directory from the path,
>> everything got OK.
>> Does anyone know what might be going on here or should I run the thing
>> and try to trap the output and post it and/or send a copy of the data
>> files in question?
> Just a btrfs user and list regular here, not a dev, but...
>
> You'll probably need to post the output for a bug fix... unless it's
> simply stalled for NNN seconds warnings (usually 30/60/90/120/etc), in
> which case the general case is known, but then you'll want to...
>
> echo w > /proc/sysrq-trigger
>
> ... and post the output from that. That's the usually requested info
> from that case, anyway. And if this is the case, the apparent lockup
> should go away on its own after some time, but it might be a few minutes
> if the files are very heavily fragmented, as is likely.
>
>
> Meanwhile, database files are part of a general category of frequently
> internally updated (as opposed to append-only updated) files that all
> copy-on-write filesystems including btrfs have problems with as they tend
> to fragment very fast and hard on COW because rewrites are to new
> locations.
>
> How large are the files in question? Are you using the btrfs autodefrag
> mount option? Do you do use snapper or otherwise do lots of (likely
> scripted) snapshots on that subvolume or filesystem?
>
> Generally speaking, if the files aren't too large (perhaps a couple
> hundred MiB or smaller), btrfs' autodefrag option can usually deal with
> the fragmentation as it occurs. This works quite well for firefox sqlite
> databases, for instance.
>
> Once the files in question get over perhaps half a gigabyte in size,
> however, that doesn't work so well, particularly if the file is being
> updated at a reasonable speed in real-time, as autodefrag queues the
> entire file for rewrite in ordered to defrag it, and at some point the
> rewriting can't keep up with the updates coming in.
>
> For large internal-rewrite-pattern files, there's the NOCOW extended file
> attribute, which tells btrfs to rewrite the files in place, and disables
> the usual checksumming and etc that can also take time and complicate
> things on database files where the database generally already has some
> file integrity management of its own, that can "fight" with the
> management btrfs does.
>
> But to be effective, setting nocow (chattr +C /path/to/file/or/dir) needs
> to be done while the file is still zero size, before it has any content.
> The easiest way to do that is to set it on the directory, before the
> files in the directory are created, so they inherit the nocow attribute
> from the directory they're created in.
>
> The easiest solution at this point might be to delete the current
> fragmented files instead of trying to defrag them, setup the nocow on the
> directory that will contain them, and then trigger a reindexing.
>
>
> However, there's one additional caveat involving snapshots. By
> definition, the first change to a fileblock after a snapshot will be copy-
> on-write despite the nocow attribute. This is because the snapshot froze
> the existing file data in place as it was, so a change to it must be
> written to a new location even if the file is set nocow. This shouldn't
> be too big of a problem if you're just taking a snapshot manually every
> week or so, but if you're using snapper or a similar automated script to
> take hourly or even per-minute snapshots, the effect is likely to be
> nearly as bad as if the file wasn't set nocow in the first place!
>
> If this is the case, creating dedicated subvolume for the directory
> containing these files is the best idea, since snapshots stop at subvolume
> boundaries, so as long as you're not snapshotting the subvolume, you can
> set nocow on directories and files within it and not have to worry about
> snapshot-based cow undermining your efforts.
>
I think you nailed it in terms of this being comparable to stuff like
virtual images and bittorrent. These are indeed a collection of
multiple large databases, one over 6GB in size, so it becomes obvious
why defrag is choking on it. It was late last night when I posted this,
but thinking on it through the night, I realized this might be what is
going on. So at this point I am just going to continue filtering these
files out of the defrag. I don't typically use databases, so this kind
of blindsided me. But thanks for confirming what I was already
beginning to suspect. This desktop search program IS active continually
and I strongly suspect also that the two programs are colliding in mid
air as they try to manipulate the database content on the drive. But it
really does produce a trainwreck systemwise. Thanks again for the
pointers and reminders on this.
prev parent reply other threads:[~2014-04-07 14:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-07 5:25 Possible application issue George Mitchell
2014-04-07 12:42 ` Duncan
2014-04-07 14:14 ` George Mitchell [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5342B2A8.7090705@chinilu.com \
--to=george@chinilu.com \
--cc=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).