All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Dimov <dimovnike@gmail.com>
To: Robert White <rwhite@pobox.com>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs receive being very slow
Date: Fri, 19 Dec 2014 18:22:47 +0200	[thread overview]
Message-ID: <549450D7.7040207@gmail.com> (raw)
In-Reply-To: <548EA090.60308@pobox.com>

Hello.

So I split the job in 2 tasks as per your suggestion. I create the
differential snapshot with btrfs send and save it on SSD - so far this 
is very efficient and the sending happens almost at full SSD speed.

When I try to "receive" the snapshot on the HDD - the speed is just as
low as before (as when I do ionice'd pipe). No ionice is used.

The hdd raw speed is, according to hdparm:
 Timing cached reads:   15848 MB in  2.00 seconds = 7928.82 MB/sec
 Timing buffered disk reads: 310 MB in  3.01 seconds = 103.02 MB/sec

And I have more than 100Gb free space on it, but the speed is still low.

So, as you mentioned it - I might be dealing with a very fragmented
system. Now there are some conclusions and questions:

 1. The btrfs send is out of question - it works great with or without
    ionice.
 2. The receive is slow no matter what I do, even if run alone. (as for
    the what kind of data is being sent, i sent the snapshot of / and
    /home and both are slow for btrfs receive)
 3. How to check how fragmented the filesystem is? (i.e. i want to know
    if this is the real cause)
 4. How to defragment all those read-only snapshots without breaking the
    compatibility with differential btrfs send. (if i understand it
    correctly the parent snapshot must be the same on source and
    destination, is this correct?)
 5. Will making those snapshots writable, defragmenting them and
    re-snapshoting them as read-only break compatibility with btrfs
    differential send? E.g. will I still be able to "btrfs receive" a
    differential snapshot after defragmentation?

Also for your suggestion to do it in a break - I would have done it but
it sometimes takes hours to sync, thats why i tried to ionice it so I
can work while it runs.

Thank you a lot for your explanations and effort!

On 15.12.2014 10:49, Robert White wrote:
> On 12/14/2014 11:41 PM, Nick Dimov wrote:
>> Hi, thanks for the answer, I will answer between the lines.
>>
>> On 15.12.2014 08:45, Robert White wrote:
>>> On 12/14/2014 08:50 PM, Nick Dimov wrote:
>>>> Hello everyone!
>>>>
>>>> First, thanks for amazing work on btrfs filesystem!
>>>>
>>>> Now the problem:
>>>> I use a ssd as my system drive (/dev/sda2) and use daily snapshots on
>>>> it. Then, from time to time, i sync those on HDD (/dev/sdb4) by using
>>>> btrfs send / receive like this:
>>>>
>>>> ionice -c3 btrfs send -p /ssd/previously_synced_snapshot
>>>> /ssd/snapshot-X
>>>> | pv | btrfs receive /hdd/snapshots
>>>>
>>>> I use pv to measure speed and i get ridiculos speeds like 5-200kiB/s!
>>>> (rarely it goes over 1miB). However if i replace the btrfs receive
>>>> with
>>>> cat >/dev/null - the speed is 400-500MiB/s (almost full SSD speed)
>>>> so I
>>>> understand that the problem is the fs on the HDD... Do you have any
>>>> idea
>>>> of how to trace this problem down?
>>>
>>>
>>> You have _lots_ of problems with that above...
>>>
>>> (1) your ionice is causing the SSD to stall the send every time the
>>> receiver does _anything_.
>> I will try to remove completely ionice - but them my system becomes
>> irresponsive :(
>
> Yep, see below.
>
> Then again if it only goes bad for a minute or two, then just launch
> the backup right as you go for a break.
>
>>>
>>> (1a) The ionice doesn't apply to the pipeline, it only applies to the
>>> command it proceeds. So it's "ionice -c btrfs send..." then pipeline
>>> then "btrfs receive" at the default io scheduling class. You need to
>>> specify it twice, or wrap it all in a script.
>>>
>>> ionice -c 3 btrfs send -p /ssd/parent /ssd/snapshot-X |
>>> ionice -c 3 btrfs receive /hdd/snapshots
>> This is usually what i do but I wanted to show that there is no throtle
>> on the receiver. (i tested it with and without - the result is the same)
>>>
>>> (1b) Your comparison case is flawed because cat >/dev/null results in
>>> no actual IO (e.g. writing to dev-null doesn't transfer any data
>>> anywhere, it just gets rubber-stamped okay at the kernel method level).
>> This was only an intention to show that the sender itself is OK.
>
> I understood why you did it, I was just trying to point out that since
> there was no other IO competing with the btrfs send, it would give you
> are really outrageously false positive. Particularly if you always
> used ionice.
>
>>>
>>> (2) You probably get negative-to-no value from using ionice on the
>>> sending side, particularly since SSDs don't have physical heads to
>>> seek around.
>> yeah in theory it should be like this, but in practice on my system -
>> when i use no ionice my system becomes very unresponsive (ubuntu 14.10).
>
> What all is in the snapshot? Is it your whole system or just /home or
> what? e.g. what are your subvolume boundaries if any?
>
> btrfs send is very efficent, but that efficency means that it can
> rifle through a heck of a lot of the parent snapshot and decide it
> doesn't need sending, and it can do so very fast, and that can be a
> huge hit on other activities. If most of your system doesn't change
> between snapshots the send will plow through your disk yelling "nope"
> and "skip this" like a shopper in a black firday riot.
>
>
>>> (2a) The value of nicing your IO is trivial on the actual SATA buss,
>>> the real value is only realized on a rotating media where the cost of
>>> interrupting other-normal-process is very high because you have to
>>> seek the heads way over there---> when other-normal-process needs them
>>> right here.
>>>
>>> (2b) Any pipeline will naturally throttle a more-left-end process to
>>> wait for a more-right-end process to read the data. The default buffer
>>> is really small, living in the neighborhood of "MAX_PIPE" so like 5k
>>> last time I looked. If you want to throttle this sort of transfer just
>>> throttle the writer. So..
>>>
>>> btrfs send -p /ssd/parent /ssd/snapshot-X |
>>> ionice -c 3 btrfs receive /hdd/snapshots
>>>
>>> (3) You need to find out if you are treating things nicely on
>>> /hdd/snapshots. If you've hoarded a lot of snapshots on there, or your
>>> snapshot history is really deep, or you are using the drive for other
>>> purposes that are unfriendly to large allocations, or you've laid out
>>> a multi-drive array poorly, then it may need some maintenance.
>> Yes this is what I suspect too, that the system is too fragmented. I
>> have about 15 snapshots now, but new snapshots are created and older
>> ones are deleted, is it possible that this caused the problem?
>> is there a way to tell how badly the file system is fragmented?
>
> Fifteen snapshots is fine. There have been some people on here taking
> snapshots every hour and keeping them for months. That gets excessive.
> As long as there is a reasonable amount of free space and you don't
> have weeks of hourly snapshots hanging around, this shouldn't be an
> issue.
>
>>>
>>> (3a) Test the raw receive throughput if you have the space to share.
>>> To do this save the output of btrfs send to a file with -f some_file.
>>> Then run the receive with -f some_file. Ideally some_file will be on
>>> yet a third media, but it's okay if its on /hdd/snapshot somewhere.
>>> Watch the output of iotop or your favorite graphical monitoring
>>> daemon. If the write throughput is suspiciously low you may be dealing
>>> with a fragmented filesystem.
>> Great idea. Will try this.
>>>
>>> (3b) If /hdd/snapshots is a multi-device filesystem and you are using
>>> "single" for data extents, try switching to RAID0. It's just as
>>> _unsafe_ as "single" but its distributed write layout will speed up
>>> your storage.
>> Its single device.
>>>
>>> (4) If your system is busy enough that you really need the ionice, you
>>> likely just need to really re-think your storage layouts and whatnot.
>> Well, its a laptop :) and i'm not doing anything when the sync happens.
>> I do ionice because without it - the system becomes unresponsive and I
>> can't even browse the internet (it just freezes for 20 seconds or so).
>> But this has to do with the sender somehow... (probably saturates the
>> SATA throughput at 500mb/s?)
>
> On a laptop, yea, that's probably gonna happen way more than on a
> server of some sort.
>
> There are a lot of interractions between the "hot" parts of programs,
> the way programs are brought into memory with mmap(), and how the disk
> cache works. When you start hoovering things up off your hard disk,
> particularly while send is comparing the snapshots and finding what
> it's _not_ going to send (such as all of /bin /usr/bin /lib /usr/lib
> etc) that high performance drive with that high performance bus will
> just muscle-aside significant parts of your browser and all the other
> "user facing" stuff.
>
> Then you have to get back in line to re-fetch the stuff that just got
> dropped from memory when you go back to your browser or whatever.
>
> With a fast SSD and fast SATA bus you _are_ going to feel the send if
> its full speed. Your system is going to be _busy_.
>
> But that's a separate thing from your question about effective
> throughput. The pipeline you gave wiht both ends ioniced will turn
> into a dainty little tea-party with each actor walking in lock-step
> and repeatedly saying "no, after you" and then waiting for the other
> to finish.
>
> Check out the ionice page description of the Idle scheduler...
>
> A program running with idle io priority will only get disk time when
> no other program has asked for disk io for a defined *grace* *period*.
> The impact of idle io processes on normal system activity should be
> zero. This scheduling class does not take a priority argument.
> Presently, this scheduling class is permitted for an ordinary user
> (since kernel 2.6.25).
>
> Grace Period. Think about those two words...
>
> So btrfs send goes out and tells receive to create a file named XXX,
> as soon as receive acts on that message btrfs send _stops_ _dead_,
> waits for it to finish, waits for the grace period, _then_ does its
> next thing.
>
> If they are _both_ niced then they _both_ wait for eachother with a
> little grace period before the other gets to go.
>
> That will take _all_ the time... So much so that your darn right, the
> send/receive will _not_ get the chance to effect your browser.
>
> Oh, and every cookie your browser writes will barge through and stop
> them both.
>
> So yeah... _slow_... really, really, slow with two ionice processes in
> a pipeline.
>
> I'd just leave off the ionice and schedule my snapshot resends for
> right before I take a bathroom break... 8-)
>
>
>>> (5) Remember to evaluate the secondary system effects. For example If
>>> /hdd/snapshots is really a USB attached external storage unit, make
>>> sure it's USB3 and so is the port you're plugging it into. Make sure
>>> you aren't paging/swapping in the general sense (or just on the cusp
>>> of doing so) as both of the programs are going to start competing with
>>> your system for buffer cache and whatnot. A "nearly busy" system can
>>> be pushed over the edge into thrashing by adding two large-IO tasks
>>> like these.
>>>
>>> (5b) Give /hdd/snapshots the once-over with smartmontools, especially
>>> if it's old, to make sure its not starting to have read/write retry
>>> delays. Old disks can get "slower" before the get all "failed".
>>>
>>> And remember, you may not be able to do squat about your results. If
>>> you are I/O bound, and you just _can't_ bear to part wiht some subset
>>> of your snapshot hoard for any reason (e.g. I know some government
>>> programs with hellish retention policies), then you might just be
>>> living the practical outcome of your policies and available hardware.
>>>
>>>
>> Thanks again for the answer I will try to do the tests described here
>> and get back.
>> Cheers!
>>
>


      reply	other threads:[~2014-12-19 16:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-15  4:50 btrfs receive being very slow Nick Dimov
2014-12-15  6:45 ` Robert White
2014-12-15  7:41   ` Nick Dimov
2014-12-15  8:49     ` Robert White
2014-12-19 16:22       ` Nick Dimov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=549450D7.7040207@gmail.com \
    --to=dimovnike@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rwhite@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.