From: Nick Dimov <dimovnike@gmail.com>
To: Robert White <rwhite@pobox.com>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs receive being very slow
Date: Fri, 19 Dec 2014 18:22:47 +0200 [thread overview]
Message-ID: <549450D7.7040207@gmail.com> (raw)
In-Reply-To: <548EA090.60308@pobox.com>
Hello.
So I split the job in 2 tasks as per your suggestion. I create the
differential snapshot with btrfs send and save it on SSD - so far this
is very efficient and the sending happens almost at full SSD speed.
When I try to "receive" the snapshot on the HDD - the speed is just as
low as before (as when I do ionice'd pipe). No ionice is used.
The hdd raw speed is, according to hdparm:
Timing cached reads: 15848 MB in 2.00 seconds = 7928.82 MB/sec
Timing buffered disk reads: 310 MB in 3.01 seconds = 103.02 MB/sec
And I have more than 100Gb free space on it, but the speed is still low.
So, as you mentioned it - I might be dealing with a very fragmented
system. Now there are some conclusions and questions:
1. The btrfs send is out of question - it works great with or without
ionice.
2. The receive is slow no matter what I do, even if run alone. (as for
the what kind of data is being sent, i sent the snapshot of / and
/home and both are slow for btrfs receive)
3. How to check how fragmented the filesystem is? (i.e. i want to know
if this is the real cause)
4. How to defragment all those read-only snapshots without breaking the
compatibility with differential btrfs send. (if i understand it
correctly the parent snapshot must be the same on source and
destination, is this correct?)
5. Will making those snapshots writable, defragmenting them and
re-snapshoting them as read-only break compatibility with btrfs
differential send? E.g. will I still be able to "btrfs receive" a
differential snapshot after defragmentation?
Also for your suggestion to do it in a break - I would have done it but
it sometimes takes hours to sync, thats why i tried to ionice it so I
can work while it runs.
Thank you a lot for your explanations and effort!
On 15.12.2014 10:49, Robert White wrote:
> On 12/14/2014 11:41 PM, Nick Dimov wrote:
>> Hi, thanks for the answer, I will answer between the lines.
>>
>> On 15.12.2014 08:45, Robert White wrote:
>>> On 12/14/2014 08:50 PM, Nick Dimov wrote:
>>>> Hello everyone!
>>>>
>>>> First, thanks for amazing work on btrfs filesystem!
>>>>
>>>> Now the problem:
>>>> I use a ssd as my system drive (/dev/sda2) and use daily snapshots on
>>>> it. Then, from time to time, i sync those on HDD (/dev/sdb4) by using
>>>> btrfs send / receive like this:
>>>>
>>>> ionice -c3 btrfs send -p /ssd/previously_synced_snapshot
>>>> /ssd/snapshot-X
>>>> | pv | btrfs receive /hdd/snapshots
>>>>
>>>> I use pv to measure speed and i get ridiculos speeds like 5-200kiB/s!
>>>> (rarely it goes over 1miB). However if i replace the btrfs receive
>>>> with
>>>> cat >/dev/null - the speed is 400-500MiB/s (almost full SSD speed)
>>>> so I
>>>> understand that the problem is the fs on the HDD... Do you have any
>>>> idea
>>>> of how to trace this problem down?
>>>
>>>
>>> You have _lots_ of problems with that above...
>>>
>>> (1) your ionice is causing the SSD to stall the send every time the
>>> receiver does _anything_.
>> I will try to remove completely ionice - but them my system becomes
>> irresponsive :(
>
> Yep, see below.
>
> Then again if it only goes bad for a minute or two, then just launch
> the backup right as you go for a break.
>
>>>
>>> (1a) The ionice doesn't apply to the pipeline, it only applies to the
>>> command it proceeds. So it's "ionice -c btrfs send..." then pipeline
>>> then "btrfs receive" at the default io scheduling class. You need to
>>> specify it twice, or wrap it all in a script.
>>>
>>> ionice -c 3 btrfs send -p /ssd/parent /ssd/snapshot-X |
>>> ionice -c 3 btrfs receive /hdd/snapshots
>> This is usually what i do but I wanted to show that there is no throtle
>> on the receiver. (i tested it with and without - the result is the same)
>>>
>>> (1b) Your comparison case is flawed because cat >/dev/null results in
>>> no actual IO (e.g. writing to dev-null doesn't transfer any data
>>> anywhere, it just gets rubber-stamped okay at the kernel method level).
>> This was only an intention to show that the sender itself is OK.
>
> I understood why you did it, I was just trying to point out that since
> there was no other IO competing with the btrfs send, it would give you
> are really outrageously false positive. Particularly if you always
> used ionice.
>
>>>
>>> (2) You probably get negative-to-no value from using ionice on the
>>> sending side, particularly since SSDs don't have physical heads to
>>> seek around.
>> yeah in theory it should be like this, but in practice on my system -
>> when i use no ionice my system becomes very unresponsive (ubuntu 14.10).
>
> What all is in the snapshot? Is it your whole system or just /home or
> what? e.g. what are your subvolume boundaries if any?
>
> btrfs send is very efficent, but that efficency means that it can
> rifle through a heck of a lot of the parent snapshot and decide it
> doesn't need sending, and it can do so very fast, and that can be a
> huge hit on other activities. If most of your system doesn't change
> between snapshots the send will plow through your disk yelling "nope"
> and "skip this" like a shopper in a black firday riot.
>
>
>>> (2a) The value of nicing your IO is trivial on the actual SATA buss,
>>> the real value is only realized on a rotating media where the cost of
>>> interrupting other-normal-process is very high because you have to
>>> seek the heads way over there---> when other-normal-process needs them
>>> right here.
>>>
>>> (2b) Any pipeline will naturally throttle a more-left-end process to
>>> wait for a more-right-end process to read the data. The default buffer
>>> is really small, living in the neighborhood of "MAX_PIPE" so like 5k
>>> last time I looked. If you want to throttle this sort of transfer just
>>> throttle the writer. So..
>>>
>>> btrfs send -p /ssd/parent /ssd/snapshot-X |
>>> ionice -c 3 btrfs receive /hdd/snapshots
>>>
>>> (3) You need to find out if you are treating things nicely on
>>> /hdd/snapshots. If you've hoarded a lot of snapshots on there, or your
>>> snapshot history is really deep, or you are using the drive for other
>>> purposes that are unfriendly to large allocations, or you've laid out
>>> a multi-drive array poorly, then it may need some maintenance.
>> Yes this is what I suspect too, that the system is too fragmented. I
>> have about 15 snapshots now, but new snapshots are created and older
>> ones are deleted, is it possible that this caused the problem?
>> is there a way to tell how badly the file system is fragmented?
>
> Fifteen snapshots is fine. There have been some people on here taking
> snapshots every hour and keeping them for months. That gets excessive.
> As long as there is a reasonable amount of free space and you don't
> have weeks of hourly snapshots hanging around, this shouldn't be an
> issue.
>
>>>
>>> (3a) Test the raw receive throughput if you have the space to share.
>>> To do this save the output of btrfs send to a file with -f some_file.
>>> Then run the receive with -f some_file. Ideally some_file will be on
>>> yet a third media, but it's okay if its on /hdd/snapshot somewhere.
>>> Watch the output of iotop or your favorite graphical monitoring
>>> daemon. If the write throughput is suspiciously low you may be dealing
>>> with a fragmented filesystem.
>> Great idea. Will try this.
>>>
>>> (3b) If /hdd/snapshots is a multi-device filesystem and you are using
>>> "single" for data extents, try switching to RAID0. It's just as
>>> _unsafe_ as "single" but its distributed write layout will speed up
>>> your storage.
>> Its single device.
>>>
>>> (4) If your system is busy enough that you really need the ionice, you
>>> likely just need to really re-think your storage layouts and whatnot.
>> Well, its a laptop :) and i'm not doing anything when the sync happens.
>> I do ionice because without it - the system becomes unresponsive and I
>> can't even browse the internet (it just freezes for 20 seconds or so).
>> But this has to do with the sender somehow... (probably saturates the
>> SATA throughput at 500mb/s?)
>
> On a laptop, yea, that's probably gonna happen way more than on a
> server of some sort.
>
> There are a lot of interractions between the "hot" parts of programs,
> the way programs are brought into memory with mmap(), and how the disk
> cache works. When you start hoovering things up off your hard disk,
> particularly while send is comparing the snapshots and finding what
> it's _not_ going to send (such as all of /bin /usr/bin /lib /usr/lib
> etc) that high performance drive with that high performance bus will
> just muscle-aside significant parts of your browser and all the other
> "user facing" stuff.
>
> Then you have to get back in line to re-fetch the stuff that just got
> dropped from memory when you go back to your browser or whatever.
>
> With a fast SSD and fast SATA bus you _are_ going to feel the send if
> its full speed. Your system is going to be _busy_.
>
> But that's a separate thing from your question about effective
> throughput. The pipeline you gave wiht both ends ioniced will turn
> into a dainty little tea-party with each actor walking in lock-step
> and repeatedly saying "no, after you" and then waiting for the other
> to finish.
>
> Check out the ionice page description of the Idle scheduler...
>
> A program running with idle io priority will only get disk time when
> no other program has asked for disk io for a defined *grace* *period*.
> The impact of idle io processes on normal system activity should be
> zero. This scheduling class does not take a priority argument.
> Presently, this scheduling class is permitted for an ordinary user
> (since kernel 2.6.25).
>
> Grace Period. Think about those two words...
>
> So btrfs send goes out and tells receive to create a file named XXX,
> as soon as receive acts on that message btrfs send _stops_ _dead_,
> waits for it to finish, waits for the grace period, _then_ does its
> next thing.
>
> If they are _both_ niced then they _both_ wait for eachother with a
> little grace period before the other gets to go.
>
> That will take _all_ the time... So much so that your darn right, the
> send/receive will _not_ get the chance to effect your browser.
>
> Oh, and every cookie your browser writes will barge through and stop
> them both.
>
> So yeah... _slow_... really, really, slow with two ionice processes in
> a pipeline.
>
> I'd just leave off the ionice and schedule my snapshot resends for
> right before I take a bathroom break... 8-)
>
>
>>> (5) Remember to evaluate the secondary system effects. For example If
>>> /hdd/snapshots is really a USB attached external storage unit, make
>>> sure it's USB3 and so is the port you're plugging it into. Make sure
>>> you aren't paging/swapping in the general sense (or just on the cusp
>>> of doing so) as both of the programs are going to start competing with
>>> your system for buffer cache and whatnot. A "nearly busy" system can
>>> be pushed over the edge into thrashing by adding two large-IO tasks
>>> like these.
>>>
>>> (5b) Give /hdd/snapshots the once-over with smartmontools, especially
>>> if it's old, to make sure its not starting to have read/write retry
>>> delays. Old disks can get "slower" before the get all "failed".
>>>
>>> And remember, you may not be able to do squat about your results. If
>>> you are I/O bound, and you just _can't_ bear to part wiht some subset
>>> of your snapshot hoard for any reason (e.g. I know some government
>>> programs with hellish retention policies), then you might just be
>>> living the practical outcome of your policies and available hardware.
>>>
>>>
>> Thanks again for the answer I will try to do the tests described here
>> and get back.
>> Cheers!
>>
>
prev parent reply other threads:[~2014-12-19 16:22 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-15 4:50 btrfs receive being very slow Nick Dimov
2014-12-15 6:45 ` Robert White
2014-12-15 7:41 ` Nick Dimov
2014-12-15 8:49 ` Robert White
2014-12-19 16:22 ` Nick Dimov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=549450D7.7040207@gmail.com \
--to=dimovnike@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=rwhite@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).