linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Major HDD performance degradation on btrfs receive
@ 2016-02-22 19:58 Nazar Mokrynskyi
  2016-02-22 23:30 ` Duncan
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Nazar Mokrynskyi @ 2016-02-22 19:58 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5933 bytes --]

> On Tue, Feb 16, 2016 at 5:44 AM, Nazar Mokrynskyi <na...@mokrynskyi.com> wrote:
> > I have 2 SSD with BTRFS filesystem (RAID) on them and several subvolumes.
> > Each 15 minutes I'm creating read-only snapshot of subvolumes /root, /home
> > and /web inside /backup.
> > After this I'm searching for last common subvolume on /backup_hdd, sending
> > difference between latest common snapshot and simply latest snapshot to
> > /backup_hdd.
> > On top of all above there is snapshots rotation, so that /backup contains
> > much less snapshots than /backup_hdd.
> >
> > I'm using this setup for last 7 months or so and this is luckily the longest
> > period when I had no problems with BTRFS at all.
> > However, last 2+ months btrfs receive command loads HDD so much that I can't
> > even get list of directories in it.
> > This happens even if diff between snapshots is really small.
> > HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files, so I
> > can't even play mp3 file from ext4 filesystem while btrfs receive is
> > running.
> > Since I'm running everything each 15 minutes this is a real headache.
> >
> > My guess is that performance hit might be caused by filesystem fragmentation
> > even though there is more than enough empty space. But I'm not sure how to
> > properly check this and can't, obviously, run defragmentation on read-only
> > subvolumes.
> >
> > I'll be thankful for anything that might help to identify and resolve this
> > issue.
> >
> > ~> uname -a
> > Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 2016 x86_64
> > x86_64 x86_64 GNU/Linux
> >
> > ~> btrfs --version
> > btrfs-progs v4.4
> >
> > ~> sudo btrfs fi show
> > Label: none  uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030
> >     Total devices 2 FS bytes used 71.00GiB
> >     devid    1 size 111.30GiB used 111.30GiB path /dev/sdb2
> >     devid    2 size 111.30GiB used 111.29GiB path /dev/sdc2
> >
> > Label: 'Backup'  uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8
> >     Total devices 1 FS bytes used 252.54GiB
> >     devid    1 size 800.00GiB used 266.08GiB path /dev/sda1
> >
> > ~> sudo btrfs fi df /
> > Data, RAID0: total=214.56GiB, used=69.10GiB
> > System, RAID1: total=8.00MiB, used=16.00KiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, RAID1: total=4.00GiB, used=1.87GiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> >
> > ~> sudo btrfs fi df /backup_hdd
> > Data, single: total=245.01GiB, used=243.61GiB
> > System, DUP: total=32.00MiB, used=48.00KiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, DUP: total=10.50GiB, used=8.93GiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> >
> > Relevant mount options:
> > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    / btrfs
> > compress=lzo,noatime,relatime,ssd,subvol=/root    0 1
> > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    /home btrfs
> > compress=lzo,noatime,relatime,ssd,subvol=/home 0    1
> > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    /backup btrfs
> > compress=lzo,noatime,relatime,ssd,subvol=/backup 0    1
> > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    /web btrfs
> > compress=lzo,noatime,relatime,ssd,subvol=/web 0    1
> > UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8    /backup_hdd btrfs
> > compress=lzo,noatime,relatime,noexec 0    1
> As already indicated by Duncan, the amount of snapshots might be just
> too much. The fragmentation on the HDD might have become very high. If
> there is limited amount of RAM in the system (so limited caching), too
> much time is lost in seeks. In addition:
>
>   compress=lzo
> this also increases the chance of scattering fragments and fragmentation.
>
>   noatime,relatime
> I am not sure why you have this. Hopefully you have the actual mount
> listed as   noatime
>
> You could use the principles of the tool/package called  snapper  to
> do a sort of non-linear snapshot thinning: further back in time you
> will have a much higher granularity of snapshot over a certain
> timeframe.
>
> You could use skinny metadata (recreate the fs with newer tools or use
> btrfstune -x on /dev/sda1). I think at the moment this flag is not
> enabled on /dev/sda1
>
> If you put just 1 btrfs fs on the hdd (so move all the content from
> the ext4 fs in the the btrfs fs) you might get better overall
> performance. I assume the ext4 fs is on the second (slower part) of
> the HDD and that is a disadvantage I think.
> But you probably have reasons for why the setup is like it is.
I've replied to Duncan's message about number of snapshots, there is 
snapshots rotation and number of snapshots it is quite small, 491 in total.

About memory - 16 GiB of RAM should be enough I guess:) Can I measure 
somehow if seeking is a problem?

What is wrong with noatime,relatime? I'm using them for a long time as 
good compromise in terms of performance.

I'll try btrfstune -x and let you know whether it changes anything.

About ext4 - it is actually because I did have some serious problems 
with BTRFS in past 2.5 years or so (however, first time I've recovered 
files by building manually git version of btrfs-tools and last time I 
had not very up to date, but backup of everything in other place so I 
didn't miss too much unrecoverable data), so for a while I'd like to 
store some files separately on filesystem that is extremely difficult to 
break. Its content not critical in terms of performance, files do not 
compress well and I do not really need any other extended features on 
that partition - so ext4 will be there for a while.

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: nazarpc@diaspora.mokrynskyi.com
Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249



[-- Attachment #2: ÐÑÑпÑогÑаÑÑÑний пÑÐ´Ð¿Ð¸Ñ S/MIME --]
[-- Type: application/pkcs7-signature, Size: 3825 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Major HDD performance degradation on btrfs receive
@ 2016-02-22 19:39 Nazar Mokrynskyi
  0 siblings, 0 replies; 34+ messages in thread
From: Nazar Mokrynskyi @ 2016-02-22 19:39 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 9963 bytes --]

> > I have 2 SSD with BTRFS filesystem (RAID) on them and several
> > subvolumes. Each 15 minutes I'm creating read-only snapshot of
> > subvolumes /root, /home and /web inside /backup.
> > After this I'm searching for last common subvolume on /backup_hdd,
> > sending difference between latest common snapshot and simply latest
> > snapshot to /backup_hdd.
> > On top of all above there is snapshots rotation, so that /backup
> > contains much less snapshots than /backup_hdd.
> One thing thing that you imply, but don't actually make explicit except
> in the btrfs command output and mount options listing, is that /backup_hdd
> is a mountpoint for a second entirely independent btrfs (LABEL=Backup),
> while /backup is a subvolume on the primary / btrfs.  Knowing that is
> quite helpful in figuring out exactly what you're doing. =:^)
>
> Further, implied, but not explicit since some folks use hdd when
> referring to ssds as well, is that the /backup_hdd hdd is spinning rust,
> tho you do make it explicit that the primary btrfs is on ssds.
>
> > I'm using this setup for last 7 months or so and this is luckily the
> > longest period when I had no problems with BTRFS at all.
> > However, last 2+ months btrfs receive command loads HDD so much that I
> > can't even get list of directories in it.
> > This happens even if diff between snapshots is really small.
> > HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files,
> > so I can't even play mp3 file from ext4 filesystem while btrfs receive
> > is running.
> > Since I'm running everything each 15 minutes this is a real headache.
>
> The *big* question is how many snapshots you have on LABEL=Backup, since
> you mention rotating backups in /backup, but don't mention rotating/
> thinning backups on LABEL=Backup, and do explicitly state that it has far
> more snapshots, and with four snapshots an hour, they'll build up rather
> fast if you aren't thinning them.
>
> The rest of this post assumes that's the issue, since you didn't mention
> thinning out the snapshots on LABEL=Backup.  If you're already familiar
> with the snapshot scaling issue and snapshot caps and thinning
> recommendations regularly posted here, feel free to skip the below as
> it'll simply be review. =:^)
>
> Btrfs has scaling issues when there's too many snapshots.  The
> recommendation I've been using is a target of no more than 250 snapshots
> per subvolume, with a target of no more than eight subvolumes and ideally
> no more than four subvolumes being snapshotted per filesystem, which
> doing the math leads to an overall filesystem target snapshot cap of
> 1000-2000, and definitely no more than 3000, tho by that point the
> scaling issues are beginning to kick in and you'll feel it in lost
> performance, particularly on spinning rust, when doing btrfs maintenance
> such as snapshotting, send/receive, balance, check, etc.
>
> Unfortunately, many people post here complaining about performance issues
> when they're running 10K+ or even 100K+ snapshots per filesystem and the
> various btrfs maintenance commands have almost ground to a halt. =:^(
>
> You say you're snapshotting three subvolumes, / /home and /web, at 15
> minute intervals.  That's 3*4=12 snapshots per hour, 12*24=288 snapshots
> per day.  If all those are on LABEL=Backup, you're hitting the 250
> snapshots per subvolume target in 250/4/24 = ... just over 2 and a half
> days.  And you're hitting the total per-filesystem snapshots target cap
> in 2000/288= ... just under seven days.
>
> If you've been doing that for 7 months with no thinning, that's
> 7*30*288= ... over 60K snapshots!  No *WONDER* you're seeing performance
> issues!
>
> Meanwhile, say you need a file from a snapshot from six months ago.  Are
> you *REALLY* going to care, or even _know_, exactly what 15 minute
> snapshot it was?  And even if you do, just digging thru 60K+ snapshots...
> OK, so we'll assume you sort them by snapshotted subvolume so only have
> to dig thru 20K+ snapshots... just digging thru 20K snapshots to find the
> exact 15-minute snapshot you need... is quite a bit of work!
>
> Instead, suppose you have a "reasonable" thinning program.  First, do you
> really need _FOUR_ snapshots an hour to LABEL=Backup?  Say you make it
> every 20 minutes, three an hour instead of four.  That already kills a
> third of them.  Then, say you take them every 15 or 20 minutes, but only
> send one per hour to LABEL=Backup.  (Or if you want, do them every 15
> minutes and send only ever other one, half-hourly to LABEL=Backup.  The
> point is to keep it both something you're comfortable with but also more
> reasonable.)
>
> For illustration, I'll say you send once an hour.  That's 3*24=72
> snapshots per day, 24/day per subvolume, already a great improvement over
> the 96/day/subvolume and 288/day total you're doing now.
>
> If then once a day, you thin down the third day back to every other hour,
> you'll have 2-3 days worth of hourly snapshots on LABEL=backup, so upto
> 72 hourly snapshots per subvolume.  If on the 8th day you thin down to
> six-hourly, 4/day, cutting out 2/3, you'll have five days of 12/day/
> subvolume, 60 snapshots per subvolume, plus the 72, 132 snapshots per
> subvolume total, to 8 days out so you can recover over a week's worth at
> at least 2-hourly, if needed.
>
> If then on the 32 day (giving you a month's worth of at least 4X/day),
> you cut every other one, giving you twice a day snapshots, that's 24 days
> of 2X/day or 48 snapshots per subvolume, plus the 132 from before, 180
> snapshots per subvolume total, now.
>
> If then on the 92 day (giving you two more months of 2X/day, a quarter's
> worth of at least 2X/day) you again thin every other one, to one per day,
> you have 60 days @ 2X/day or 120 snapshots per subvolume, plus the 180 we
> had already, 300 snapshots per subvolume, now.
>
> OK, so we're already over our target 250/subvolume, so we could thin a
> bit more drastically.  However, we're only snapshotting three subvolumes,
> so we can afford a bit of lenience on the per-subvolume cap as that's
> assuming 4-8 snapshotted subvolumes, and we're still well under our total
> filesystem snapshot cap.
>
> If then you keep another quarter's worth of daily snapshots, out to 183
> days, that's 91 days of daily snapshots, 91 per subvolume, on top of the
> 300 we had, so now 391 snapshots per subvolume.
>
> If you then thin to weekly snapshots, cutting 6/7, and keep them around
> another 27 weeks (just over half a year, thus over a year total), that's
> 27 more snapshots per subvolume, plus the 391 we had, 418 snapshots per
> subvolume total.
>
> 418 snapshots per subvolume total, starting at 3-4X per hour to /backup
> and hourly to LABEL=Backup, thinning down gradually to weekly after six
> months and keeping that for the rest of the year.  Given that you're
> snapshotting three subvolumes, that's 1254 snapshots total, still well
> within the 1000-2000 total snapshots per filesystem target cap.
>
> During that year, if the data is worth it, you should have done an offsite
> or at least offline backup, we'll say quarterly.  After that, keeping the
> local online backup around is merely for convenience, and with quarterly
> backups, after a year you have multiple copies and can simply delete the
> year-old snapshots, one a week, probably at the same time you thin down
> the six-month-old daily snapshots to weekly.
>
> Compare that just over 1200 snapshots to the 60K+ snapshots you may have
> now, knowing that scaling over 10K snapshots is an issue particularly on
> spinning rust, and you should be able to appreciate the difference it's
> likely to make. =:^)
>
> But at the same time, in practice it'll probably be much easier to
> actually retrieve something from a snapshot a few months old, because you
> won't have tens of thousands of effectively useless snapshots to sort
> thru as you will be regularly thinning them down! =:^)
>
> > ~> uname [-r]
> > 4.5.0-rc4-haswell
> >
> > ~> btrfs --version
> > btrfs-progs v4.4
>
> You're staying current with your btrfs versions.  Kudos on that! =:^)
>
> And on including btrfs fi show and btrfs fi df, as they were useful, tho
> I'm snipping them here.
>
> One more tip.  Btrfs quotas are known to have scaling issues as well.  If
> you're using them, they'll exacerbate the problem.  And while I'm not
> sure about current 4.4 status, thru 4.3 at least, they were buggy and not
> reliable anyway.  So the recommendation is to leave quotas off on btrfs,
> and use some other more mature filesystem where they're known to work
> reliably if you really need them.
>
> -- 
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
First of all, sorry for delay, for whatever reason was not subscribed to 
mailing list.

You are right, RAID is on 2 SSDs and backup_hdd (LABEL=Backup) is 
separate really HDD.

Example was simplified to give an overview to not dig too deep into 
details. I actually have correct backups rotation, so we are not talking 
about thousands of snapshots:)
Here is tool I've created and using right now: 
https://github.com/nazar-pc/just-backup-btrfs
I'm keeping all snapshots for last day, up to 90 for last month and up 
to 48 throughout the year.
So as result there are:
* 166 snapshots in /backup_hdd/root
* 166 snapshots in /backup_hdd/home
* 159 snapshots in /backup_hdd/web

I'm not using quotas, there is nothing on this BTRFS partition besides 
mentioned snapshots.

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: nazarpc@diaspora.mokrynskyi.com
Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249



[-- Attachment #2: ÐÑÑпÑогÑаÑÑÑний пÑÐ´Ð¿Ð¸Ñ S/MIME --]
[-- Type: application/pkcs7-signature, Size: 3825 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Major HDD performance degradation on btrfs receive
@ 2016-02-16  4:44 Nazar Mokrynskyi
  2016-02-16  9:10 ` Duncan
  2016-02-18 18:19 ` Henk Slager
  0 siblings, 2 replies; 34+ messages in thread
From: Nazar Mokrynskyi @ 2016-02-16  4:44 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3355 bytes --]

I have 2 SSD with BTRFS filesystem (RAID) on them and several 
subvolumes. Each 15 minutes I'm creating read-only snapshot of 
subvolumes /root, /home and /web inside /backup.
After this I'm searching for last common subvolume on /backup_hdd, 
sending difference between latest common snapshot and simply latest 
snapshot to /backup_hdd.
On top of all above there is snapshots rotation, so that /backup 
contains much less snapshots than /backup_hdd.

I'm using this setup for last 7 months or so and this is luckily the 
longest period when I had no problems with BTRFS at all.
However, last 2+ months btrfs receive command loads HDD so much that I 
can't even get list of directories in it.
This happens even if diff between snapshots is really small.
HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files, 
so I can't even play mp3 file from ext4 filesystem while btrfs receive 
is running.
Since I'm running everything each 15 minutes this is a real headache.

My guess is that performance hit might be caused by filesystem 
fragmentation even though there is more than enough empty space. But I'm 
not sure how to properly check this and can't, obviously, run 
defragmentation on read-only subvolumes.

I'll be thankful for anything that might help to identify and resolve 
this issue.

~> uname -a
Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 2016 
x86_64 x86_64 x86_64 GNU/Linux

~> btrfs --version
btrfs-progs v4.4

~> sudo btrfs fi show
Label: none  uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030
     Total devices 2 FS bytes used 71.00GiB
     devid    1 size 111.30GiB used 111.30GiB path /dev/sdb2
     devid    2 size 111.30GiB used 111.29GiB path /dev/sdc2

Label: 'Backup'  uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8
     Total devices 1 FS bytes used 252.54GiB
     devid    1 size 800.00GiB used 266.08GiB path /dev/sda1

~> sudo btrfs fi df /
Data, RAID0: total=214.56GiB, used=69.10GiB
System, RAID1: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=4.00GiB, used=1.87GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

~> sudo btrfs fi df /backup_hdd
Data, single: total=245.01GiB, used=243.61GiB
System, DUP: total=32.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=10.50GiB, used=8.93GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

Relevant mount options:
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    / btrfs        
compress=lzo,noatime,relatime,ssd,subvol=/root    0 1
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    /home btrfs        
compress=lzo,noatime,relatime,ssd,subvol=/home 0    1
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    /backup btrfs        
compress=lzo,noatime,relatime,ssd,subvol=/backup 0    1
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030    /web btrfs        
compress=lzo,noatime,relatime,ssd,subvol=/web 0    1
UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8    /backup_hdd btrfs        
compress=lzo,noatime,relatime,noexec 0    1

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: nazarpc@diaspora.mokrynskyi.com
Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249



[-- Attachment #2: ÐÑÑпÑогÑаÑÑÑний пÑÐ´Ð¿Ð¸Ñ S/MIME --]
[-- Type: application/pkcs7-signature, Size: 3825 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-05-27  2:03 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-22 19:58 Major HDD performance degradation on btrfs receive Nazar Mokrynskyi
2016-02-22 23:30 ` Duncan
2016-02-23 17:26   ` Marc MERLIN
2016-02-23 17:34     ` Marc MERLIN
2016-02-23 18:01       ` Lionel Bouton
2016-02-23 18:30         ` Marc MERLIN
2016-02-23 20:35           ` Lionel Bouton
2016-02-24 10:01     ` Patrik Lundquist
2016-02-23 16:55 ` Nazar Mokrynskyi
2016-02-23 17:05   ` Alexander Fougner
2016-02-23 17:18     ` Nazar Mokrynskyi
2016-02-23 17:29       ` Alexander Fougner
2016-02-23 17:34         ` Nazar Mokrynskyi
2016-02-23 18:09           ` Austin S. Hemmelgarn
2016-02-23 17:44 ` Nazar Mokrynskyi
2016-02-24 22:32   ` Henk Slager
2016-02-24 22:46     ` Nazar Mokrynskyi
     [not found]     ` <ce805cd7-422c-ab6a-fbf8-18a304aa640d@mokrynskyi.com>
2016-02-25  1:04       ` Henk Slager
2016-03-15  0:47         ` Nazar Mokrynskyi
2016-03-15 23:11           ` Henk Slager
2016-03-16  3:37             ` Nazar Mokrynskyi
2016-03-16  4:18               ` Chris Murphy
2016-03-16  4:23                 ` Nazar Mokrynskyi
2016-03-16  6:51                   ` Chris Murphy
2016-03-16 11:53                     ` Austin S. Hemmelgarn
2016-03-16 20:58                       ` Chris Murphy
2016-03-16  4:22               ` Chris Murphy
2016-03-17  7:00               ` Duncan
2016-03-18 14:22                 ` Nazar Mokrynskyi
2016-05-27  1:57                   ` Nazar Mokrynskyi
  -- strict thread matches above, loose matches on Subject: below --
2016-02-22 19:39 Nazar Mokrynskyi
2016-02-16  4:44 Nazar Mokrynskyi
2016-02-16  9:10 ` Duncan
2016-02-18 18:19 ` Henk Slager

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).