linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Extremely slow metadata performance
@ 2013-12-05 17:52 John Goerzen
  2013-12-05 19:39 ` Duncan
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: John Goerzen @ 2013-12-05 17:52 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

Hello,

I have observed extremely slow metadata performance with btrfs. This may 
be a bit of a nightmare scenario; it involves untarring a backup of
1.6TB of backuppc data, which contains millions of hardlinks and much
data, onto USB 2.0 disks.

I have run disk monitoring tools such as dstat while performing these
operations to see what's going on.

The behavior I notice is this:

   * When unpacking large files, the USB drives sustain activity in the
     20-40 MB/s range, as expected.
   * When creating vast numbers of hardlinks instead, the activity is
     roughly this:
       o Bursts of output from tar due to -v, sometimes corresponding to
         reads in the 300KB/s range (I suspect this has
         to do with caching)
       o Tar blocked for minutes while writes to the disk occur, in the
         300-600KB/s range.

This occurs even when nobarrier,noatime are specified as mount options. 
  I know the disk is capable of far more, because btrfs gets
far more from it when writing large files.

There are two USB drives in this btrfs filesystem: a 1TB and a 2TB
drive.  I have tried the raid1, raid0, and single metadata profiles.
Anecdotal evidence suggests that raid1 performs the worst, raid0 the
best, and single somewhere in between.  The data is in single mode.

Is this behavior known and expected?

Thanks,

John

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Extremely slow metadata performance
  2013-12-05 17:52 Extremely slow metadata performance John Goerzen
@ 2013-12-05 19:39 ` Duncan
  2013-12-07  9:22   ` Marc MERLIN
  2013-12-05 23:41 ` Russell Coker
       [not found] ` <5930575.71jgM0vnzg@xev>
  2 siblings, 1 reply; 7+ messages in thread
From: Duncan @ 2013-12-05 19:39 UTC (permalink / raw)
  To: linux-btrfs

John Goerzen posted on Thu, 05 Dec 2013 11:52:04 -0600 as excerpted:

> Hello,
> 
> I have observed extremely slow metadata performance with btrfs. This may
> be a bit of a nightmare scenario; it involves untarring a backup of
> 1.6TB of backuppc data, which contains millions of hardlinks and much
> data, onto USB 2.0 disks.

> Is this behavior known and expected?

Yes.  Btrfs doesn't do well with lots of hardlinks and indeed until 
relatively recently had a hard-limit on the number of hardlinks possible 
within a directory, that hardlink-heavy use-cases would regularly hit.  
That was worked around, but there's an additional level of indirection 
once the first level link-pool is filled, and you're not the first to 
have observed that btrfs performance isn't the best in that sort of 
scenario.  That's known.

Other filesystems will probably do quite a bit better for hardlink style 
backups and other hardlink-heavy use-cases.  Either that, or consider 
using btrfs, but with some other form of backup, possibly btrfs 
snapshots, or COW reflinks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Extremely slow metadata performance
  2013-12-05 17:52 Extremely slow metadata performance John Goerzen
  2013-12-05 19:39 ` Duncan
@ 2013-12-05 23:41 ` Russell Coker
       [not found] ` <5930575.71jgM0vnzg@xev>
  2 siblings, 0 replies; 7+ messages in thread
From: Russell Coker @ 2013-12-05 23:41 UTC (permalink / raw)
  To: John Goerzen; +Cc: linux-btrfs@vger.kernel.org

On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote:
> I have observed extremely slow metadata performance with btrfs. This may
> be a bit of a nightmare scenario; it involves untarring a backup of
> 1.6TB of backuppc data, which contains millions of hardlinks and much
> data, onto USB 2.0 disks.

How does this compare to using Ext4 on the same hardware and same data?

> I have run disk monitoring tools such as dstat while performing these
> operations to see what's going on.
> 
> The behavior I notice is this:
> 
>    * When unpacking large files, the USB drives sustain activity in the
>      20-40 MB/s range, as expected.
>    * When creating vast numbers of hardlinks instead, the activity is
>      roughly this:
>        o Bursts of output from tar due to -v, sometimes corresponding to
>          reads in the 300KB/s range (I suspect this has
>          to do with caching)
>        o Tar blocked for minutes while writes to the disk occur, in the
>          300-600KB/s range.

Is iostat indicating that the disk is at 100% capacity?

> This occurs even when nobarrier,noatime are specified as mount options.
>   I know the disk is capable of far more, because btrfs gets
> far more from it when writing large files.

Write speeds as low as 600KB/s isn't uncommon when there's lots of seeks.  
I've seen similar performance from RAID arrays.  Is BTRFS doing much worse 
than Ext4 in terms of the number of seeks needed for writing that data?

> There are two USB drives in this btrfs filesystem: a 1TB and a 2TB
> drive.  I have tried the raid1, raid0, and single metadata profiles.
> Anecdotal evidence suggests that raid1 performs the worst, raid0 the
> best, and single somewhere in between.  The data is in single mode.
> 
> Is this behavior known and expected?

Last time I did Postal tests I didn't find a lot of difference between BTRFS 
and Ext4 when using 60K average file size (from memory) on a single partition.  
BTRFS did worse when it was using internal RAID-1 and Ext4 was on a Linux 
software RAID-1.  But 60K file size may be larger than the average file you 
use if you use lots of links.

For backups BTRFS seems to perform a lot better (for both creation and 
removing snapshots) if you use snapshots instead of "cp -rl" or equivalent.  
However I have had ongoing problems of BTRFS hanging on snapshot removal which 
aren't fixed in the latest Debian packaged kernels.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Extremely slow metadata performance
       [not found] ` <5930575.71jgM0vnzg@xev>
@ 2013-12-06  4:48   ` John Goerzen
  2013-12-06 14:35   ` John Goerzen
  1 sibling, 0 replies; 7+ messages in thread
From: John Goerzen @ 2013-12-06  4:48 UTC (permalink / raw)
  To: russell; +Cc: linux-btrfs@vger.kernel.org

On 12/05/2013 05:32 PM, Russell Coker wrote:
> On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote:
>
>  > I have observed extremely slow metadata performance with btrfs. This may
>
>  > be a bit of a nightmare scenario; it involves untarring a backup of
>
>  > 1.6TB of backuppc data, which contains millions of hardlinks and much
>
>  > data, onto USB 2.0 disks.
>
> How does this compare to using Ext4 on the same hardware and same data?

Hi Russell,

I can't perform a direct apples-to-apples comparison here, because the 
capabilities of the filesystems are dissimilar.  We're talking two USB 
drives, one of them 1TB and the other 2TB.  With ext4, I used LVM to 
combine them into a single volume (no striping).

With the best case in btrfs (-m raid0 -d raid0 -- yes I now know that 
wastes space), it is still slower than ext4.  Overall performance with 
backuppc is somewhat slower.  Creation and sometimes deletion of vast 
numbers of hardlinks, or of vast numbers of empty directories, is much 
slower and can lead to processes blocked waiting for I/O to complete for 
so long that they trigger kernel hung task warnings in dmesg with btrfs. 
  Even a simple ls on a directory with <20 files can take minutes to 
complete when tar is creating these directories or links.

one other datapoint: zfs, even zfs-fuse, on the exact same workload as 
btrfs is significantly faster.

> Write speeds as low as 600KB/s isn't uncommon when there's lots of
> seeks. I've seen similar performance from RAID arrays. Is BTRFS doing
> much worse than Ext4 in terms of the number of seeks needed for writing
> that data?

The strange thing is that these writes come in bursts during which 
userland access to the filesystem is apparently paused.  This suggests 
to me that there is some caching going on here (perfectly fine).  But 
given that, it seems some reordering could be taking place here?

usb-storage does not support NCQ, so perhaps also this is an issue of 
higher latency on USB vs. SATA/SCSI.

-- John

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Extremely slow metadata performance
       [not found] ` <5930575.71jgM0vnzg@xev>
  2013-12-06  4:48   ` John Goerzen
@ 2013-12-06 14:35   ` John Goerzen
  1 sibling, 0 replies; 7+ messages in thread
From: John Goerzen @ 2013-12-06 14:35 UTC (permalink / raw)
  To: russell; +Cc: linux-btrfs@vger.kernel.org

On 12/05/2013 05:32 PM, Russell Coker wrote:
> On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote:
>
>  > I have observed extremely slow metadata performance with btrfs. This may
>
>  > be a bit of a nightmare scenario; it involves untarring a backup of
>
>  > 1.6TB of backuppc data, which contains millions of hardlinks and much
>
>  > data, onto USB 2.0 disks.
>
> How does this compare to using Ext4 on the same hardware and same data?

Hi Russell,

I can't perform a direct apples-to-apples comparison here, because the 
capabilities of the filesystems are dissimilar.  We're talking two USB 
drives, one of them 1TB and the other 2TB.  With ext4, I used LVM to 
combine them into a single volume (no striping).

With the best case in btrfs (-m raid0 -d raid0 -- yes I now know that 
wastes space), it is still slower than ext4.  Overall performance with 
backuppc is somewhat slower.  Creation and sometimes deletion of vast 
numbers of hardlinks, or of vast numbers of empty directories, is much 
slower and can lead to processes blocked waiting for I/O to complete for 
so long that they trigger kernel hung task warnings in dmesg with btrfs. 
  Even a simple ls on a directory with <20 files can take minutes to 
complete when tar is creating these directories or links.

one other datapoint: zfs, even zfs-fuse, on the exact same workload as 
btrfs is significantly faster.

> Write speeds as low as 600KB/s isn't uncommon when there's lots of
> seeks. I've seen similar performance from RAID arrays. Is BTRFS doing
> much worse than Ext4 in terms of the number of seeks needed for writing
> that data?

The strange thing is that these writes come in bursts during which 
userland access to the filesystem is apparently paused.  This suggests 
to me that there is some caching going on here (perfectly fine).  But 
given that, it seems some reordering could be taking place here?

usb-storage does not support NCQ, so perhaps also this is an issue of 
higher latency on USB vs. SATA/SCSI.

-- John

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Extremely slow metadata performance
  2013-12-05 19:39 ` Duncan
@ 2013-12-07  9:22   ` Marc MERLIN
  2013-12-07 17:10     ` Kai Krakow
  0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2013-12-07  9:22 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Thu, Dec 05, 2013 at 07:39:30PM +0000, Duncan wrote:
> John Goerzen posted on Thu, 05 Dec 2013 11:52:04 -0600 as excerpted:
> 
> > Hello,
> > 
> > I have observed extremely slow metadata performance with btrfs. This may
> > be a bit of a nightmare scenario; it involves untarring a backup of
> > 1.6TB of backuppc data, which contains millions of hardlinks and much
> > data, onto USB 2.0 disks.
> 
> > Is this behavior known and expected?
> 
> Yes.  Btrfs doesn't do well with lots of hardlinks and indeed until 
> relatively recently had a hard-limit on the number of hardlinks possible 
> within a directory, that hardlink-heavy use-cases would regularly hit.  
> That was worked around, but there's an additional level of indirection 
> once the first level link-pool is filled, and you're not the first to 
> have observed that btrfs performance isn't the best in that sort of 
> scenario.  That's known.
> 
> Other filesystems will probably do quite a bit better for hardlink style 
> backups and other hardlink-heavy use-cases.  Either that, or consider 
> using btrfs, but with some other form of backup, possibly btrfs 
> snapshots, or COW reflinks.

Thanks for explaining this.

I'm one of those people who uses cp -al and rsync to do backups. Indeed
I should likely rework the flow to use subvolumes and snapshots.
You also mentioned reflinks, and it sounds like I can use
cp -a --reflink instead of cp -al.

Also, would the dedupe code in btrfs effectively allow for the same
thing after the fact if you use cp without --reflink? Is it stable
enough nowadays?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Extremely slow metadata performance
  2013-12-07  9:22   ` Marc MERLIN
@ 2013-12-07 17:10     ` Kai Krakow
  0 siblings, 0 replies; 7+ messages in thread
From: Kai Krakow @ 2013-12-07 17:10 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN <marc@merlins.org> schrieb:

> I'm one of those people who uses cp -al and rsync to do backups. Indeed
> I should likely rework the flow to use subvolumes and snapshots.
> You also mentioned reflinks, and it sounds like I can use
> cp -a --reflink instead of cp -al.
> 
> Also, would the dedupe code in btrfs effectively allow for the same
> thing after the fact if you use cp without --reflink? Is it stable
> enough nowadays?

You may want to try my backup script as a starting point:

https://gist.github.com/kakra/5520370

It uses a scratch area to create a mirror of your system, then - if it was 
successful - take a snapshot of it. Next time the backup runs, the scratch 
area is updated with rsync using its features for in-file delta updates, so 
you get optimal small deltas between each snapshot. Normally, rsync would 
create a new copy of each file in local mode, so using these features is 
important.

My script is meant to be run from systemd with auto-mount backup disk. But 
it should be easy to adapt to cron if you need to. All the work is done 
within the bash script.

You may wonder about the scratch area. I took this route to ensure that 
snapshots always contain clean and consistent backups. If there was a 
problem during rsync, there will be no snapshot. It's that easy. Usually I 
see other solutions taking a snapshot first, then modify this snapshot. So 
you never know if snapshots are clean and consistent. With my approach you 
get clean snapshots and even if the last backup broke and you are thus 
missing a snapshot, you still have data in the scratch area to recover from.

In my scenario my backup script is able to hold several weeks of daily 
backups in the backlog. The destination is mounted with compress-force=zlib 
so I get good compression, too. However, I still had no time for 
implementing automatic cleanup of old snapshots. Feel free to add it.

HTH
Kai


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-12-07 17:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 17:52 Extremely slow metadata performance John Goerzen
2013-12-05 19:39 ` Duncan
2013-12-07  9:22   ` Marc MERLIN
2013-12-07 17:10     ` Kai Krakow
2013-12-05 23:41 ` Russell Coker
     [not found] ` <5930575.71jgM0vnzg@xev>
2013-12-06  4:48   ` John Goerzen
2013-12-06 14:35   ` John Goerzen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).