* Extremely slow metadata performance
@ 2013-12-05 17:52 John Goerzen
2013-12-05 19:39 ` Duncan
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: John Goerzen @ 2013-12-05 17:52 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
Hello,
I have observed extremely slow metadata performance with btrfs. This may
be a bit of a nightmare scenario; it involves untarring a backup of
1.6TB of backuppc data, which contains millions of hardlinks and much
data, onto USB 2.0 disks.
I have run disk monitoring tools such as dstat while performing these
operations to see what's going on.
The behavior I notice is this:
* When unpacking large files, the USB drives sustain activity in the
20-40 MB/s range, as expected.
* When creating vast numbers of hardlinks instead, the activity is
roughly this:
o Bursts of output from tar due to -v, sometimes corresponding to
reads in the 300KB/s range (I suspect this has
to do with caching)
o Tar blocked for minutes while writes to the disk occur, in the
300-600KB/s range.
This occurs even when nobarrier,noatime are specified as mount options.
I know the disk is capable of far more, because btrfs gets
far more from it when writing large files.
There are two USB drives in this btrfs filesystem: a 1TB and a 2TB
drive. I have tried the raid1, raid0, and single metadata profiles.
Anecdotal evidence suggests that raid1 performs the worst, raid0 the
best, and single somewhere in between. The data is in single mode.
Is this behavior known and expected?
Thanks,
John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Extremely slow metadata performance
2013-12-05 17:52 Extremely slow metadata performance John Goerzen
@ 2013-12-05 19:39 ` Duncan
2013-12-07 9:22 ` Marc MERLIN
2013-12-05 23:41 ` Russell Coker
[not found] ` <5930575.71jgM0vnzg@xev>
2 siblings, 1 reply; 7+ messages in thread
From: Duncan @ 2013-12-05 19:39 UTC (permalink / raw)
To: linux-btrfs
John Goerzen posted on Thu, 05 Dec 2013 11:52:04 -0600 as excerpted:
> Hello,
>
> I have observed extremely slow metadata performance with btrfs. This may
> be a bit of a nightmare scenario; it involves untarring a backup of
> 1.6TB of backuppc data, which contains millions of hardlinks and much
> data, onto USB 2.0 disks.
> Is this behavior known and expected?
Yes. Btrfs doesn't do well with lots of hardlinks and indeed until
relatively recently had a hard-limit on the number of hardlinks possible
within a directory, that hardlink-heavy use-cases would regularly hit.
That was worked around, but there's an additional level of indirection
once the first level link-pool is filled, and you're not the first to
have observed that btrfs performance isn't the best in that sort of
scenario. That's known.
Other filesystems will probably do quite a bit better for hardlink style
backups and other hardlink-heavy use-cases. Either that, or consider
using btrfs, but with some other form of backup, possibly btrfs
snapshots, or COW reflinks.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Extremely slow metadata performance
2013-12-05 17:52 Extremely slow metadata performance John Goerzen
2013-12-05 19:39 ` Duncan
@ 2013-12-05 23:41 ` Russell Coker
[not found] ` <5930575.71jgM0vnzg@xev>
2 siblings, 0 replies; 7+ messages in thread
From: Russell Coker @ 2013-12-05 23:41 UTC (permalink / raw)
To: John Goerzen; +Cc: linux-btrfs@vger.kernel.org
On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote:
> I have observed extremely slow metadata performance with btrfs. This may
> be a bit of a nightmare scenario; it involves untarring a backup of
> 1.6TB of backuppc data, which contains millions of hardlinks and much
> data, onto USB 2.0 disks.
How does this compare to using Ext4 on the same hardware and same data?
> I have run disk monitoring tools such as dstat while performing these
> operations to see what's going on.
>
> The behavior I notice is this:
>
> * When unpacking large files, the USB drives sustain activity in the
> 20-40 MB/s range, as expected.
> * When creating vast numbers of hardlinks instead, the activity is
> roughly this:
> o Bursts of output from tar due to -v, sometimes corresponding to
> reads in the 300KB/s range (I suspect this has
> to do with caching)
> o Tar blocked for minutes while writes to the disk occur, in the
> 300-600KB/s range.
Is iostat indicating that the disk is at 100% capacity?
> This occurs even when nobarrier,noatime are specified as mount options.
> I know the disk is capable of far more, because btrfs gets
> far more from it when writing large files.
Write speeds as low as 600KB/s isn't uncommon when there's lots of seeks.
I've seen similar performance from RAID arrays. Is BTRFS doing much worse
than Ext4 in terms of the number of seeks needed for writing that data?
> There are two USB drives in this btrfs filesystem: a 1TB and a 2TB
> drive. I have tried the raid1, raid0, and single metadata profiles.
> Anecdotal evidence suggests that raid1 performs the worst, raid0 the
> best, and single somewhere in between. The data is in single mode.
>
> Is this behavior known and expected?
Last time I did Postal tests I didn't find a lot of difference between BTRFS
and Ext4 when using 60K average file size (from memory) on a single partition.
BTRFS did worse when it was using internal RAID-1 and Ext4 was on a Linux
software RAID-1. But 60K file size may be larger than the average file you
use if you use lots of links.
For backups BTRFS seems to perform a lot better (for both creation and
removing snapshots) if you use snapshots instead of "cp -rl" or equivalent.
However I have had ongoing problems of BTRFS hanging on snapshot removal which
aren't fixed in the latest Debian packaged kernels.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Extremely slow metadata performance
[not found] ` <5930575.71jgM0vnzg@xev>
@ 2013-12-06 4:48 ` John Goerzen
2013-12-06 14:35 ` John Goerzen
1 sibling, 0 replies; 7+ messages in thread
From: John Goerzen @ 2013-12-06 4:48 UTC (permalink / raw)
To: russell; +Cc: linux-btrfs@vger.kernel.org
On 12/05/2013 05:32 PM, Russell Coker wrote:
> On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote:
>
> > I have observed extremely slow metadata performance with btrfs. This may
>
> > be a bit of a nightmare scenario; it involves untarring a backup of
>
> > 1.6TB of backuppc data, which contains millions of hardlinks and much
>
> > data, onto USB 2.0 disks.
>
> How does this compare to using Ext4 on the same hardware and same data?
Hi Russell,
I can't perform a direct apples-to-apples comparison here, because the
capabilities of the filesystems are dissimilar. We're talking two USB
drives, one of them 1TB and the other 2TB. With ext4, I used LVM to
combine them into a single volume (no striping).
With the best case in btrfs (-m raid0 -d raid0 -- yes I now know that
wastes space), it is still slower than ext4. Overall performance with
backuppc is somewhat slower. Creation and sometimes deletion of vast
numbers of hardlinks, or of vast numbers of empty directories, is much
slower and can lead to processes blocked waiting for I/O to complete for
so long that they trigger kernel hung task warnings in dmesg with btrfs.
Even a simple ls on a directory with <20 files can take minutes to
complete when tar is creating these directories or links.
one other datapoint: zfs, even zfs-fuse, on the exact same workload as
btrfs is significantly faster.
> Write speeds as low as 600KB/s isn't uncommon when there's lots of
> seeks. I've seen similar performance from RAID arrays. Is BTRFS doing
> much worse than Ext4 in terms of the number of seeks needed for writing
> that data?
The strange thing is that these writes come in bursts during which
userland access to the filesystem is apparently paused. This suggests
to me that there is some caching going on here (perfectly fine). But
given that, it seems some reordering could be taking place here?
usb-storage does not support NCQ, so perhaps also this is an issue of
higher latency on USB vs. SATA/SCSI.
-- John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Extremely slow metadata performance
[not found] ` <5930575.71jgM0vnzg@xev>
2013-12-06 4:48 ` John Goerzen
@ 2013-12-06 14:35 ` John Goerzen
1 sibling, 0 replies; 7+ messages in thread
From: John Goerzen @ 2013-12-06 14:35 UTC (permalink / raw)
To: russell; +Cc: linux-btrfs@vger.kernel.org
On 12/05/2013 05:32 PM, Russell Coker wrote:
> On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote:
>
> > I have observed extremely slow metadata performance with btrfs. This may
>
> > be a bit of a nightmare scenario; it involves untarring a backup of
>
> > 1.6TB of backuppc data, which contains millions of hardlinks and much
>
> > data, onto USB 2.0 disks.
>
> How does this compare to using Ext4 on the same hardware and same data?
Hi Russell,
I can't perform a direct apples-to-apples comparison here, because the
capabilities of the filesystems are dissimilar. We're talking two USB
drives, one of them 1TB and the other 2TB. With ext4, I used LVM to
combine them into a single volume (no striping).
With the best case in btrfs (-m raid0 -d raid0 -- yes I now know that
wastes space), it is still slower than ext4. Overall performance with
backuppc is somewhat slower. Creation and sometimes deletion of vast
numbers of hardlinks, or of vast numbers of empty directories, is much
slower and can lead to processes blocked waiting for I/O to complete for
so long that they trigger kernel hung task warnings in dmesg with btrfs.
Even a simple ls on a directory with <20 files can take minutes to
complete when tar is creating these directories or links.
one other datapoint: zfs, even zfs-fuse, on the exact same workload as
btrfs is significantly faster.
> Write speeds as low as 600KB/s isn't uncommon when there's lots of
> seeks. I've seen similar performance from RAID arrays. Is BTRFS doing
> much worse than Ext4 in terms of the number of seeks needed for writing
> that data?
The strange thing is that these writes come in bursts during which
userland access to the filesystem is apparently paused. This suggests
to me that there is some caching going on here (perfectly fine). But
given that, it seems some reordering could be taking place here?
usb-storage does not support NCQ, so perhaps also this is an issue of
higher latency on USB vs. SATA/SCSI.
-- John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Extremely slow metadata performance
2013-12-05 19:39 ` Duncan
@ 2013-12-07 9:22 ` Marc MERLIN
2013-12-07 17:10 ` Kai Krakow
0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2013-12-07 9:22 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
On Thu, Dec 05, 2013 at 07:39:30PM +0000, Duncan wrote:
> John Goerzen posted on Thu, 05 Dec 2013 11:52:04 -0600 as excerpted:
>
> > Hello,
> >
> > I have observed extremely slow metadata performance with btrfs. This may
> > be a bit of a nightmare scenario; it involves untarring a backup of
> > 1.6TB of backuppc data, which contains millions of hardlinks and much
> > data, onto USB 2.0 disks.
>
> > Is this behavior known and expected?
>
> Yes. Btrfs doesn't do well with lots of hardlinks and indeed until
> relatively recently had a hard-limit on the number of hardlinks possible
> within a directory, that hardlink-heavy use-cases would regularly hit.
> That was worked around, but there's an additional level of indirection
> once the first level link-pool is filled, and you're not the first to
> have observed that btrfs performance isn't the best in that sort of
> scenario. That's known.
>
> Other filesystems will probably do quite a bit better for hardlink style
> backups and other hardlink-heavy use-cases. Either that, or consider
> using btrfs, but with some other form of backup, possibly btrfs
> snapshots, or COW reflinks.
Thanks for explaining this.
I'm one of those people who uses cp -al and rsync to do backups. Indeed
I should likely rework the flow to use subvolumes and snapshots.
You also mentioned reflinks, and it sounds like I can use
cp -a --reflink instead of cp -al.
Also, would the dedupe code in btrfs effectively allow for the same
thing after the fact if you use cp without --reflink? Is it stable
enough nowadays?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Extremely slow metadata performance
2013-12-07 9:22 ` Marc MERLIN
@ 2013-12-07 17:10 ` Kai Krakow
0 siblings, 0 replies; 7+ messages in thread
From: Kai Krakow @ 2013-12-07 17:10 UTC (permalink / raw)
To: linux-btrfs
Marc MERLIN <marc@merlins.org> schrieb:
> I'm one of those people who uses cp -al and rsync to do backups. Indeed
> I should likely rework the flow to use subvolumes and snapshots.
> You also mentioned reflinks, and it sounds like I can use
> cp -a --reflink instead of cp -al.
>
> Also, would the dedupe code in btrfs effectively allow for the same
> thing after the fact if you use cp without --reflink? Is it stable
> enough nowadays?
You may want to try my backup script as a starting point:
https://gist.github.com/kakra/5520370
It uses a scratch area to create a mirror of your system, then - if it was
successful - take a snapshot of it. Next time the backup runs, the scratch
area is updated with rsync using its features for in-file delta updates, so
you get optimal small deltas between each snapshot. Normally, rsync would
create a new copy of each file in local mode, so using these features is
important.
My script is meant to be run from systemd with auto-mount backup disk. But
it should be easy to adapt to cron if you need to. All the work is done
within the bash script.
You may wonder about the scratch area. I took this route to ensure that
snapshots always contain clean and consistent backups. If there was a
problem during rsync, there will be no snapshot. It's that easy. Usually I
see other solutions taking a snapshot first, then modify this snapshot. So
you never know if snapshots are clean and consistent. With my approach you
get clean snapshots and even if the last backup broke and you are thus
missing a snapshot, you still have data in the scratch area to recover from.
In my scenario my backup script is able to hold several weeks of daily
backups in the backlog. The destination is mounted with compress-force=zlib
so I get good compression, too. However, I still had no time for
implementing automatic cleanup of old snapshots. Feel free to add it.
HTH
Kai
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-12-07 17:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 17:52 Extremely slow metadata performance John Goerzen
2013-12-05 19:39 ` Duncan
2013-12-07 9:22 ` Marc MERLIN
2013-12-07 17:10 ` Kai Krakow
2013-12-05 23:41 ` Russell Coker
[not found] ` <5930575.71jgM0vnzg@xev>
2013-12-06 4:48 ` John Goerzen
2013-12-06 14:35 ` John Goerzen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).