From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f41.google.com ([74.125.82.41]:38295 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751908AbcCRJdt (ORCPT ); Fri, 18 Mar 2016 05:33:49 -0400 Received: by mail-wm0-f41.google.com with SMTP id l68so28418221wml.1 for ; Fri, 18 Mar 2016 02:33:49 -0700 (PDT) Received: from [192.168.4.43] (dslb-092-077-043-105.092.077.pools.vodafone-ip.de. [92.77.43.105]) by smtp.googlemail.com with ESMTPSA id jv7sm11441486wjb.36.2016.03.18.02.33.47 for (version=TLSv1/SSLv3 cipher=OTHER); Fri, 18 Mar 2016 02:33:47 -0700 (PDT) Subject: Re: [4.4.1] btrfs-transacti frequent high CPU usage despite little fragmentation To: linux-btrfs@vger.kernel.org References: <56E92B38.10605@inoio.de> From: Ole Langbehn Message-ID: <56EBCB7A.1010508@gmail.com> Date: Fri, 18 Mar 2016 10:33:46 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="tioIwIohMolTQ9lIcKiQIkuCOHMEQD2s0" Sender: linux-btrfs-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --tioIwIohMolTQ9lIcKiQIkuCOHMEQD2s0 Content-Type: multipart/mixed; boundary="quGdV2GilOq5aCJERne6Xp6rMw7VG7Lj2" From: Ole Langbehn To: linux-btrfs@vger.kernel.org Message-ID: <56EBCB7A.1010508@gmail.com> Subject: Re: [4.4.1] btrfs-transacti frequent high CPU usage despite little fragmentation References: <56E92B38.10605@inoio.de> In-Reply-To: --quGdV2GilOq5aCJERne6Xp6rMw7VG7Lj2 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Duncan, thanks for your extensive answer. On 17.03.2016 11:51, Duncan wrote: > Ole Langbehn posted on Wed, 16 Mar 2016 10:45:28 +0100 as excerpted: >=20 > Have you tried the autodefrag mount option, then defragging? That shou= ld=20 > help keep rewritten files from fragmenting so heavily, at least. On=20 > spinning rust it doesn't play so well with large (half-gig plus)=20 > databases or VM images, but on ssds it should scale rather larger; on=20 > fast SSDs I'd not expect problems until 1-2 GiB, possibly higher. Since I do have some big VM images, I never tried autodefrag. > For large dbs or VM images, too large for autodefrag to handle well, th= e=20 > nocow attribute is the usual suggestion, but I'll skip the details on=20 > that for now, as you may not need it with autodefrag on an ssd, unless = > your database and VM files are several gig apiece. Since posting the original post, I experimented with setting the firefox places.sqlite to nodatacow (on a new file). 1 extent since, seems to work= =2E >> BTW: I did a VACUUM on the sqlite db and afterwards it had 1 extent. >> Expected, just saying that vacuuming seems to be a good measure for >> defragmenting sqlite databases. >=20 > I know the concept, but out of curiousity, what tool do you use for=20 > that? I imagine my firefox sqlite dbs could use some vacuuming as well= ,=20 > but don't have the foggiest idea how to go about it. simple call of the command line interface, like with any other SQL DB: # sqlite3 /path/to/db.sqlite "VACUUM;" > Of *most* importance, you really *really* need to do something about th= at=20 > data chunk imbalance, and to a lessor extent that metadata chunk=20 > imbalance, because your unallocated space is well under a gig (306 MiB)= ,=20 > with all that extra space, hundreds of gigs of it, locked up in unused = or=20 > only partially used chunks. I'm curious - why is that a bad thing? > The subject says 4.4.1, but it's unclear whether that's your kernel=20 > version or your btrfs-progs userspace version. If that's your userspac= e=20 > version and you're running an old kernel, strongly consider upgrading t= o=20 > the LTS kernel 4.1 or 4.4 series if possible, or at least the LTS serie= s=20 > before that, 3.18. Those or the latest couple current kernel series, 4= =2E5=20 > and 4.4, and 4.3 for the moment as 4.5 is /just/ out, are the recommend= ed=20 > and best supported versions. # uname -r 4.4.1-gentoo # btrfs --version btrfs-progs v4.4.1 So, both 4.4.1 ;), but I meant userspace. > Try this: >=20 > btrfs balance start -dusage=3D0 -musage=3D0. Did this although I'm reasonably up to date kernel-wise. I am very sure that the filesystem has never seen <3.18. Took some minutes, ended up wit= h # btrfs filesystem usage / Overall: Device size: 915.32GiB Device allocated: 681.32GiB Device unallocated: 234.00GiB Device missing: 0.00B Used: 153.80GiB Free (estimated): 751.08GiB (min: 751.08GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:667.31GiB, Used:150.22GiB /dev/sda2 667.31GiB Metadata,single: Size:14.01GiB, Used:3.58GiB /dev/sda2 14.01GiB System,single: Size:4.00MiB, Used:112.00KiB /dev/sda2 4.00MiB Unallocated: /dev/sda2 234.00GiB -> Helped with data, not with metadata. > Then start with metadata, and up the usage numbers which are percentage= s,=20 > like this: >=20 > btrfs balance start -musage=3D5. >=20 > Then if it works up the number to 10, 20, etc. upped it up to 70, relocated a total of 13 out of 685 chunks: Metadata,single: Size:5.00GiB, Used:3.58GiB /dev/sda2 5.00GiB > Once you have several gigs in unallocated, then try the same thing with= =20 > data: >=20 > btrfs balance start -musage=3D5 >=20 > And again, increase it in increments of 5 or 10% at a time, to 50 or=20 > 70%. did # btrfs balance start -dusage=3D70 straight away, took ages, regularly froze processes for minutes, after about 8h status is: # btrfs balance status / Balance on '/' is paused 192 out of about 595 chunks balanced (194 considered), 68% left # btrfs filesystem usage / Overall: Device size: 915.32GiB Device allocated: 482.04GiB Device unallocated: 433.28GiB Device missing: 0.00B Used: 154.36GiB Free (estimated): 759.48GiB (min: 759.48GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:477.01GiB, Used:150.80GiB /dev/sda2 477.01GiB Metadata,single: Size:5.00GiB, Used:3.56GiB /dev/sda2 5.00GiB System,single: Size:32.00MiB, Used:96.00KiB /dev/sda2 32.00MiB Unallocated: /dev/sda2 433.28GiB -> Looking good. Will proceed when I don't need the box to actually be responsive. > Second thing, consider tweaking your trim/discard policy [...] >=20 > The recommendation is to put fstrim in a cron or systemd timer job,=20 > executing it weekly or similar, preferably at a time when all those=20 > unqueued trims won't affect your normal work. I have it in cron.weekly, since the creation of the filesystem: fstrim -v / >> $LOG Cheers, Ole --quGdV2GilOq5aCJERne6Xp6rMw7VG7Lj2-- --tioIwIohMolTQ9lIcKiQIkuCOHMEQD2s0 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlbry3oACgkQoxgU3D8/uhEfVQCffSOC37Fpsj04mN0DtvAimPpf 32EAoItgmQyXfcd3///e26Z2PBodLGk9 =MWq/ -----END PGP SIGNATURE----- --tioIwIohMolTQ9lIcKiQIkuCOHMEQD2s0--