Oddly slow read performance with near-full largish FS

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Oddly slow read performance with near-full largish FS
@ 2014-12-17  2:42 Charles Cazabon
  2014-12-19  8:58 ` Satoru Takeuchi
  2014-12-20 10:57 ` Robert White
  0 siblings, 2 replies; 16+ messages in thread
From: Charles Cazabon @ 2014-12-17  2:42 UTC (permalink / raw)
  To: btrfs list

Hi,

I've been running btrfs for various filesystems for a few years now, and have
recently run into problems with a large filesystem becoming *really* slow for
basic reading.  None of the debugging/testing suggestions I've come across in
the wiki or in the mailing list archives seems to have helped.

Background: this particular filesystem holds backups for various other
machines on the network, a mix of rdiff-backup data (so lots of small files)
and rsync copies of larger files (everything from ~5MB data files to ~60GB VM
HD images).  There's roughly 16TB of data in this filesystem (the filesystem
is ~17TB).  The btrfs filesystem is a simple single volume, no snapshots,
multiple devices, or anything like that.  It's an LVM logical volume on top of
dmcrypt on top of an mdadm RAID set (8 disks in RAID 6).

The performance:  trying to copy the data off this filesystem to another
(non-btrfs) filesystem with rsync or just cp was taking aaaages - I found one
suggestion that it could be because updating the atimes required a COW of the
metadata in btrfs, so I mounted the filesystem noatime, but this doesn't
appear to have made any difference.  The speeds I'm seeing (with iotop)
fluctuate a lot.  They spend most of the time in the range of 1-3 MB/s, with
large periods of time where no IO seems to happen at all, and occasional short
spikes to ~25-30 MB/s.  System load seems to sit around 10-12 (with only 2
processes reported as running, everything else sleeping) while this happens.
The server is doing nothing other than this copy at the time.  The only
processes using any noticable CPU are rsync (source and destination processes,
around 3% CPU each, plus an md0:raid6 process around 2-3%), and a handful of
"kworker" processes, perhaps one per CPU (there are 8 physical cores in the
server, plus hyperthreading).

Other filesystems on the same physical disks have no trouble exceeding 100MB/s
reads.  The machine is not swapping (16GB RAM, ~8GB swap with 0 swap used).

Is there something obvious I'm missing here?  Is there a reason I can only
average ~3MB/s reads from a btrfs filesystem?

kernel is x86_64 linux-stable 3.17.6.  btrfs-progs is v3.17.3-3-g8cb0438.
Output of the various info commands is:

  $ sudo btrfs fi df /media/backup/
  Data, single: total=16.24TiB, used=15.73TiB
  System, DUP: total=8.00MiB, used=1.75MiB
  System, single: total=4.00MiB, used=0.00
  Metadata, DUP: total=35.50GiB, used=34.05GiB
  Metadata, single: total=8.00MiB, used=0.00
  unknown, single: total=512.00MiB, used=0.00

  $ btrfs --version
  Btrfs v3.17.3-3-g8cb0438

  $ sudo btrfs fi show

  Label: 'backup'  uuid: c18dfd04-d931-4269-b999-e94df3b1918c
    Total devices 1 FS bytes used 15.76TiB
    devid    1 size 16.37TiB used 16.31TiB path /dev/mapper/vg-backup

Thanks in advance for any suggestions.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-17  2:42 Oddly slow read performance with near-full largish FS Charles Cazabon
@ 2014-12-19  8:58 ` Satoru Takeuchi
  2014-12-19 16:58   ` Charles Cazabon
  2014-12-20 10:57 ` Robert White
  1 sibling, 1 reply; 16+ messages in thread
From: Satoru Takeuchi @ 2014-12-19  8:58 UTC (permalink / raw)
  To: Charles Cazabon, linux-btrfs@vger.kernel.org

Hi,

Sorry for late reply. Let me ask some questions.

On 2014/12/17 11:42, Charles Cazabon wrote:
> Hi,
>
> I've been running btrfs for various filesystems for a few years now, and have
> recently run into problems with a large filesystem becoming *really* slow for
> basic reading.  None of the debugging/testing suggestions I've come across in
> the wiki or in the mailing list archives seems to have helped.
>
> Background: this particular filesystem holds backups for various other
> machines on the network, a mix of rdiff-backup data (so lots of small files)
> and rsync copies of larger files (everything from ~5MB data files to ~60GB VM
> HD images).  There's roughly 16TB of data in this filesystem (the filesystem
> is ~17TB).  The btrfs filesystem is a simple single volume, no snapshots,
> multiple devices, or anything like that.  It's an LVM logical volume on top of
> dmcrypt on top of an mdadm RAID set (8 disks in RAID 6).

Q1. You mean your Btrfs file system exists on the top of
     the following deep layers?

+---------------+
|Btrfs(single)  |
+---------------+
|LVM(non RAID?) |
+---------------+
|dmcrypt        |
+---------------+
|mdadm RAID set |
+---------------+

# Unfortunately, I don't know how Btrfs works in conjunction
#with such a deep layers.

Q2. If Q1 is true, is it possible to reduce that layers as follows?

+-----------+
|Btrfs(*1)  |
+-----------+
|dmcrypt    |
+-----------+

It's because there are too many layers and these have
the same/similar features and heavy layered file system
tends to cause more trouble than thinner layered ones
regardless of file system type.

*1) Currently I don't recommend you to use RAID56 of Btrfs.
     So, if RAID6 is mandatory, mdadm RAID6 is also necessary.

>
> The performance:  trying to copy the data off this filesystem to another
> (non-btrfs) filesystem with rsync or just cp was taking aaaages - I found one
> suggestion that it could be because updating the atimes required a COW of the
> metadata in btrfs, so I mounted the filesystem noatime, but this doesn't
> appear to have made any difference.  The speeds I'm seeing (with iotop)
> fluctuate a lot.  They spend most of the time in the range of 1-3 MB/s, with
> large periods of time where no IO seems to happen at all, and occasional short
> spikes to ~25-30 MB/s.  System load seems to sit around 10-12 (with only 2
> processes reported as running, everything else sleeping) while this happens.
> The server is doing nothing other than this copy at the time.  The only
> processes using any noticable CPU are rsync (source and destination processes,
> around 3% CPU each, plus an md0:raid6 process around 2-3%), and a handful of
> "kworker" processes, perhaps one per CPU (there are 8 physical cores in the
> server, plus hyperthreading).
>
> Other filesystems on the same physical disks have no trouble exceeding 100MB/s
> reads.  The machine is not swapping (16GB RAM, ~8GB swap with 0 swap used).

Q3. They are also consist of the following layers?

+---------------+
|XFS/ext4       |
+---------------+
|LVM(non RAID?) |
+---------------+
|dmcrypt        |
+---------------+
|mdadm RAID set |
+---------------+

Q4. Are other filesystems also near-full?

Q5. Is there any error/warning message about
     Btrfs/LVM/dmcrypt/mdadm/hardwares?

Thanks,
Satoru

>
> Is there something obvious I'm missing here?  Is there a reason I can only
> average ~3MB/s reads from a btrfs filesystem?
>
> kernel is x86_64 linux-stable 3.17.6.  btrfs-progs is v3.17.3-3-g8cb0438.
> Output of the various info commands is:
>
>    $ sudo btrfs fi df /media/backup/
>    Data, single: total=16.24TiB, used=15.73TiB
>    System, DUP: total=8.00MiB, used=1.75MiB
>    System, single: total=4.00MiB, used=0.00
>    Metadata, DUP: total=35.50GiB, used=34.05GiB
>    Metadata, single: total=8.00MiB, used=0.00
>    unknown, single: total=512.00MiB, used=0.00
>
>    $ btrfs --version
>    Btrfs v3.17.3-3-g8cb0438
>
>    $ sudo btrfs fi show
>
>    Label: 'backup'  uuid: c18dfd04-d931-4269-b999-e94df3b1918c
>      Total devices 1 FS bytes used 15.76TiB
>      devid    1 size 16.37TiB used 16.31TiB path /dev/mapper/vg-backup
>
> Thanks in advance for any suggestions.
>
> Charles
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-19  8:58 ` Satoru Takeuchi
@ 2014-12-19 16:58   ` Charles Cazabon
  2014-12-19 17:33     ` Duncan
  0 siblings, 1 reply; 16+ messages in thread
From: Charles Cazabon @ 2014-12-19 16:58 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> wrote:
> 
> Let me ask some questions.

Sure - thanks for taking an interest.

> On 2014/12/17 11:42, Charles Cazabon wrote:
> > There's roughly 16TB of data in this filesystem (the filesystem is ~17TB).
> > The btrfs filesystem is a simple single volume, no snapshots, multiple
> > devices, or anything like that.  It's an LVM logical volume on top of
> > dmcrypt on top of an mdadm RAID set (8 disks in RAID 6).
> 
> Q1. You mean your Btrfs file system exists on the top of
>     the following deep layers?
> 
> +---------------+
> |Btrfs(single)  |
> +---------------+
> |LVM(non RAID?) |
> +---------------+
> |dmcrypt        |
> +---------------+
> |mdadm RAID set |
> +---------------+

Yes, precisely.  mdadm is used to make a large RAID6 device, which is
encrypted with LUKS, on top of which is layered LVM (for ease of management),
and the btrfs filesystem sits on that.

> Q2. If Q1 is true, is it possible to reduce that layers as follows?
> 
> +-----------+
> |Btrfs(*1)  |
> +-----------+
> |dmcrypt    |
> +-----------+

I don't see how I could do that - I simply have far too much data for a single
disk (not to mention I don't want to risk loss of data from a single disk
failing).  This filesystem has 16.x TB of data in it at present.

> It's because there are too many layers and these have
> the same/similar features and heavy layered file system
> tends to cause more trouble than thinner layered ones
> regardless of file system type.

This configuration is one I've been using for many years.  It's only recently
that I've noticed it being particularly slow with btrfs -- I don't know if
that's because the filesystem has filled up past some critical point, or due
to something else entirely.  That's why I'm trying to figure this out.

> *1) Currently I don't recommend you to use RAID56 of Btrfs.
>     So, if RAID6 is mandatory, mdadm RAID6 is also necessary.

Yes, exactly.  That's why I use mdadm.

> > The speeds I'm seeing (with iotop) fluctuate a lot.  They spend most of
> > the time in the range of 1-3 MB/s, with large periods of time where no IO
> > seems to happen at all, and occasional short spikes to ~25-30 MB/s.
> > System load seems to sit around 10-12 (with only 2 processes reported as
> > running, everything else sleeping) while this happens.
[...]
> > Other filesystems on the same physical disks have no trouble exceeding
> > 100MB/s reads.  The machine is not swapping (16GB RAM, ~8GB swap with 0
> > swap used).
> 
> Q3. They are also consist of the following layers?

Yes, exactly the same configuration.  The fact that I don't see any speed
problems with other filesystems (even in the same LVM volume group) leads me
in the direction of suspecting something to do with btrfs.

> Q4. Are other filesystems also near-full?

No, not particularly.  Now, the btrfs volume in question isn't exactly close
to full - there's more than 500 GB free.  It's just *relatively* full.

> Q5. Is there any error/warning message about
>     Btrfs/LVM/dmcrypt/mdadm/hardwares?

No, no errors or warnings in logs related to the disks, LVM, or btrfs.  I have
historically, with previous kernels, gotten the "task blocked for more than
120 seconds" warnings fairly often, but I haven't seen those lately.

Is there any other info I can collect on this that would help?

Thanks,

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-19 16:58   ` Charles Cazabon
@ 2014-12-19 17:33     ` Duncan
  2014-12-20  8:53       ` Chris Murphy
  2014-12-20 10:03       ` Robert White
  0 siblings, 2 replies; 16+ messages in thread
From: Duncan @ 2014-12-19 17:33 UTC (permalink / raw)
  To: linux-btrfs

Charles Cazabon posted on Fri, 19 Dec 2014 10:58:49 -0600 as excerpted:

> This configuration is one I've been using for many years.  It's only
> recently that I've noticed it being particularly slow with btrfs -- I
> don't know if that's because the filesystem has filled up past some
> critical point, or due to something else entirely.  That's why I'm
> trying to figure this out.

Not recommending at this point, just saying these are options...

Btrfs raid56 mode should, I believe, be pretty close to done with the 
latest patches.  That would be 3.19, however, which isn't out yet of 
course.

There's also raid10, if you have enough devices or little enough data to 
do it.  That's much more mature than raid56 mode and should be about as 
mature and stable as btrfs in single-device mode, which is what you are 
using now.  But it'll require more devices than a raid56 would...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-19 17:33     ` Duncan
@ 2014-12-20  8:53       ` Chris Murphy
  2014-12-20 10:03       ` Robert White
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Murphy @ 2014-12-20  8:53 UTC (permalink / raw)
  To: Btrfs BTRFS

On Fri, Dec 19, 2014 at 10:33 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Charles Cazabon posted on Fri, 19 Dec 2014 10:58:49 -0600 as excerpted:

> There's also raid10, if you have enough devices or little enough data to
> do it.  That's much more mature than raid56 mode and should be about as
> mature and stable as btrfs in single-device mode, which is what you are
> using now.  But it'll require more devices than a raid56 would...

And also with such large storage stacks with big drives, when they
fail (note I use when not if) it takes a long time to restore. So if
you have the ability to break them up and use something like GlusterFS
to distribute it, it helps to mitigate this as well as other kinds of
failures like power supply, logic board, controllers, and with georep
even the entire local site.

This is not meant to indicate the current layout is wrong. Just that
there are other possibilities to achieve the desired up-time and data
safety.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-19 17:33     ` Duncan
  2014-12-20  8:53       ` Chris Murphy
@ 2014-12-20 10:03       ` Robert White
  1 sibling, 0 replies; 16+ messages in thread
From: Robert White @ 2014-12-20 10:03 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 12/19/2014 09:33 AM, Duncan wrote:
> Charles Cazabon posted on Fri, 19 Dec 2014 10:58:49 -0600 as excerpted:
>
>> This configuration is one I've been using for many years.  It's only
>> recently that I've noticed it being particularly slow with btrfs -- I
>> don't know if that's because the filesystem has filled up past some
>> critical point, or due to something else entirely.  That's why I'm
>> trying to figure this out.
>
> Not recommending at this point, just saying these are options...
>
> Btrfs raid56 mode should, I believe, be pretty close to done with the
> latest patches.  That would be 3.19, however, which isn't out yet of
> course.

Putting the encryption above the raid is a _huge_ win that he'd lose.

I've used this same layering before (though not with btrfs).

So if you write a sector in this order only one encryption event (e.g. 
"encrypt this sector") has to take place no matter what raid level is in 
place.

If you put the encryption below the raid, then a write or one sector on 
a non-degraded RAID5 requires four encryption events (two decrypts, one 
for the parity and one for the sector being overwritten; followed by two 
encryptions on the same results).

In degraded conditions the profile is much worse.

If encryptions and RAID > 0 is in use, he's better off with what he's 
got in terms of CPU and scheduling.

>
> There's also raid10, if you have enough devices or little enough data to
> do it.  That's much more mature than raid56 mode and should be about as
> mature and stable as btrfs in single-device mode, which is what you are
> using now.  But it'll require more devices than a raid56 would...
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-17  2:42 Oddly slow read performance with near-full largish FS Charles Cazabon
  2014-12-19  8:58 ` Satoru Takeuchi
@ 2014-12-20 10:57 ` Robert White
  2014-12-21 16:32   ` Charles Cazabon
  1 sibling, 1 reply; 16+ messages in thread
From: Robert White @ 2014-12-20 10:57 UTC (permalink / raw)
  To: btrfs list

On 12/16/2014 06:42 PM, Charles Cazabon wrote:
> Hi,
>
> I've been running btrfs for various filesystems for a few years now, and have
> recently run into problems with a large filesystem becoming *really* slow for
> basic reading.  None of the debugging/testing suggestions I've come across in
> the wiki or in the mailing list archives seems to have helped.
>
> Background: this particular filesystem holds backups for various other
> machines on the network, a mix of rdiff-backup data (so lots of small files)
> and rsync copies of larger files (everything from ~5MB data files to ~60GB VM
> HD images).  There's roughly 16TB of data in this filesystem (the filesystem
> is ~17TB).  The btrfs filesystem is a simple single volume, no snapshots,
> multiple devices, or anything like that.  It's an LVM logical volume on top of
> dmcrypt on top of an mdadm RAID set (8 disks in RAID 6).
>
> The performance:  trying to copy the data off this filesystem to another
> (non-btrfs) filesystem with rsync or just cp was taking aaaages - I found one
> suggestion that it could be because updating the atimes required a COW of the
> metadata in btrfs, so I mounted the filesystem noatime, but this doesn't
> appear to have made any difference.  The speeds I'm seeing (with iotop)
> fluctuate a lot.  They spend most of the time in the range of 1-3 MB/s, with
> large periods of time where no IO seems to happen at all, and occasional short
> spikes to ~25-30 MB/s.  System load seems to sit around 10-12 (with only 2
> processes reported as running, everything else sleeping) while this happens.
> The server is doing nothing other than this copy at the time.  The only
> processes using any noticable CPU are rsync (source and destination processes,
> around 3% CPU each, plus an md0:raid6 process around 2-3%), and a handful of
> "kworker" processes, perhaps one per CPU (there are 8 physical cores in the
> server, plus hyperthreading).
>
> Other filesystems on the same physical disks have no trouble exceeding 100MB/s
> reads.  The machine is not swapping (16GB RAM, ~8GB swap with 0 swap used).
>
> Is there something obvious I'm missing here?  Is there a reason I can only
> average ~3MB/s reads from a btrfs filesystem?
>
> kernel is x86_64 linux-stable 3.17.6.  btrfs-progs is v3.17.3-3-g8cb0438.
> Output of the various info commands is:
>
>    $ sudo btrfs fi df /media/backup/
>    Data, single: total=16.24TiB, used=15.73TiB
>    System, DUP: total=8.00MiB, used=1.75MiB
>    System, single: total=4.00MiB, used=0.00
>    Metadata, DUP: total=35.50GiB, used=34.05GiB
>    Metadata, single: total=8.00MiB, used=0.00
>    unknown, single: total=512.00MiB, used=0.00
>
>    $ btrfs --version
>    Btrfs v3.17.3-3-g8cb0438
>
>    $ sudo btrfs fi show
>
>    Label: 'backup'  uuid: c18dfd04-d931-4269-b999-e94df3b1918c
>      Total devices 1 FS bytes used 15.76TiB
>      devid    1 size 16.37TiB used 16.31TiB path /dev/mapper/vg-backup
>
> Thanks in advance for any suggestions.
>
> Charles
>

Totally spit-balling ideas here (e.g. no suggestion as to which one to 
try first etc, just typing them as they come to me):

Have you tried increasing the number of stripe buffers for the 
filesystem? If you've gotten things spread way out you might be 
thrashing your stripe cache. (see /sys/block/md(number 
here)/md/stripe_cache_size).

Have you taken SMART (smartmotools etc) to these disks to see if any of 
them are reporting any sort of incipient failure conditions? If one or 
more drives is reporting recoverable read errors it might just be 
clogging you up.

Try experimentally mounting the filesystem read-only and dong some read 
tests. This elimination of all possible write sources will tell you 
things. In particular if all your reads just start breezing through then 
you know something in the write path is "iffy". One thing that comes to 
mind is that anything accessing the drive with a barrier-style operation 
(wait for verification of data sync all the way to disk) would have to 
pass all the way down through the encryption layer which could be having 
a multiplier effect. (you know, lots of very short delays making a large 
net delay).

Have you changed any hardware lately in a way that could de-optimize 
your interrupt handling.

I have a vague recollection that somewhere in the last month and a half 
or so there was a patch here (or in the kernel changelogs) about an 
extra put operation (or something) that would cause a worker thread to 
roll over to -1, then spin back down to zero before work could proceed. 
I know, could I _be_ more vague? Right? Try switching to kernel 3.18.1 
to see if the issue just goes away. (Honestly this one's just been 
scratching at my brain since I started writing this reply and I just 
_can't_ remember the reference for it... dangit...)

When was the last time you did any of the maintenance things (like 
balance or defrag)? Not that I'd want to sit through 15Tb of that sort 
of thing, but I'm curious about the maintenance history.

Does the read performance fall off with uptime? E.g. is it "okay" right 
after a system boot and then start to fall off as uptime (and activity) 
increases? I _imagine_ that if your filesystem huge and your server is 
modest by comparison in terms of ram, cache pinning and fragmentation 
can start becoming a real problem. What else besides marshaling this 
filesystem is this system used for?

Have you tried segregating some of your system memory for to make sure 
that you aren't actually having application performance issues?  I've 
had some luck with kernelcore= and moveablecore= (particularly 
moveablecore=) kernel command line options when dealing with IO induced 
fragmentation. On problematic systems I'll try classifying at least 1/4 
of the system ram as movablecore. (e.g. on my 8GiB laptop were I do some 
of my experimental work, I have moveablecore=2G on the command line). 
Any pages that get locked into memory will be moved out of the 
movable-only memory first. This can have a profound (usually positive) 
effect on applications that want to spread out in memory. If you are 
running anything that likes large swaths of memory then this can help a 
lot. Particularly if you are also running programs that traverse large 
swaths of disk. Some programs (rsync of large files etc may be such a 
program) can do "much better" if you've done this. (BUT DON'T OVERDO IT, 
enough is good but too much is very bad. 8-) ).

ASIDE: Anything that uses hugepages, transparent or explicit, in any 
serious number has a tendency to antagonize the system cache (and 
vice-versa). It's a silent fight of the cache-pressure sort. When you 
explicitly declare an amount of ram for moveable pages only, the disk 
cache will not grow into that space. so moveablecore=3G creates 3GiB of 
space where only unlocked pages (malloced heap, stack, etc; basically 
only things that can get moved -- particularly swapped -- will go in 
that space.) The practical effect is that certain kinds of pressures 
will never compete. So broad-format disk I/O (e.g. using find etc) will 
tend to be on one side of the barrier while video playback buffers and 
virtual machine's ram regions are on the other. The broad and deep 
filesystem you describe could be thwarting your program's attempt to 
access it. That is, the rsync's need to load a large number of inodes 
could be starving rsync for memory (etc). Keeping the disk cache out of 
your program's space at least in part could prevent some very 
"interesting" contention models from ruining your day.

Or it could just make things worse.

So it's worth a try but it's not gospel. 8-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-20 10:57 ` Robert White
@ 2014-12-21 16:32   ` Charles Cazabon
  2014-12-21 21:32     ` Robert White
  0 siblings, 1 reply; 16+ messages in thread
From: Charles Cazabon @ 2014-12-21 16:32 UTC (permalink / raw)
  To: btrfs list

Hi, Robert,

Thanks for the response.  Many of the things you mentioned I have tried, but
for completeness:

> Have you taken SMART (smartmotools etc) to these disks

Yes.  The disks are actually connected to a proper hardware RAID controller
that does SMART monitoring of all the disks, although I don't use the RAID
features of the controller.  By using mdadm, if the controller fails I can
slap the disks in another machine, or a different controller into this one,
and still have it work without needing to worry about getting a replacement
for this particular model of controller.

There are no errors or warnings from SMART for the disks.

> Try experimentally mounting the filesystem read-only

Actually, I'd already done that before I mailed the list.  It made no
difference to the symptoms.

> Have you changed any hardware lately in a way that could de-optimize
> your interrupt handling.

No.

> I have a vague recollection that somewhere in the last month and a
> half or so there was a patch here (or in the kernel changelogs)
> about an extra put operation (or something) that would cause a
> worker thread to roll over to -1, then spin back down to zero before
> work could proceed. I know, could I _be_ more vague? Right? Try
> switching to kernel 3.18.1 to see if the issue just goes away.

I tend to track linux-stable pretty closely (as that seems to be recommended
for btrfs use), so I already switched to 3.18.1 as soon as it came out.  That
made no difference to the symptoms either.

> When was the last time you did any of the maintenance things (like
> balance or defrag)? Not that I'd want to sit through 15Tb of that
> sort of thing, but I'm curious about the maintenance history.

I don't generally do those at all.  I was under the impression that balance
would not apply in my case as btrfs is on a single logical device, but I see
that I was wrong in that impression.  Is this something that is recommended on
a regular basis?  Most of the advice I've read regarding them is that it's no
longer necessary unless there is a particular problem that these will fix...

> Does the read performance fall off with uptime?

No.  I see these problems right from boot.

> I _imagine_ that if your filesystem huge and your server is modest by
> comparison in terms of ram, cache pinning and fragmentation can start
> becoming a real problem. What else besides marshaling this filesystem is
> this system used for?

This particular server is only used for holding backups of other machines,
nothing else.  It has far more CPU and memory (2x quad-core Xeon plus
hyperthreading, 16GB RAM) than it needs for this task.  So when I say the
machine is doing nothing other than this copy/rsync I'm currently running,
that's practically the literal truth - there are the normal system processes
and my ssh/shell running and that's about it.

> Have you tried segregating some of your system memory for to make
> sure that you aren't actually having application performance issues?

The system isn't running out of memory; as I say, about the only userspace
processes running are ssh, my shell, and rsync.

However, your first suggestion caused me to slap myself:

> Have you tried increasing the number of stripe buffers for the
> filesystem?

This I had totally forgotten.  When I bump up the stripe cache size, it
*seems* (so far, at least) to eliminate the slowest performance I'm seeing -
specifically, the periods I've been seeing where no I/O at all seems to
happen, plus the long runs of 1-3MB/s.  The copy is now staying pretty much in
the 22-27MB/s range.

That's not as fast as the hardware is capable of - as I say, with other
filesystems on the same hardware, I can easily see 100+MB/s - but it's much
better than it was.

Is this remaining difference (25 vs 100+ MB/s) simply due to btrfs not being
tuned for performance yet, or is there something else I'm probably
overlooking?

Thanks,

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-21 16:32   ` Charles Cazabon
@ 2014-12-21 21:32     ` Robert White
  2014-12-21 22:53       ` Charles Cazabon
  2014-12-22  2:13       ` Satoru Takeuchi
  0 siblings, 2 replies; 16+ messages in thread
From: Robert White @ 2014-12-21 21:32 UTC (permalink / raw)
  To: btrfs list

On 12/21/2014 08:32 AM, Charles Cazabon wrote:
> Hi, Robert,
>
> Thanks for the response.  Many of the things you mentioned I have tried, but
> for completeness:
>
>> Have you taken SMART (smartmotools etc) to these disks
> There are no errors or warnings from SMART for the disks.

Do make sure you are regularly running the long "offline" test. [offline 
is a bad name, what it really should be called is the long idle-interval 
test. sigh] about once a week. Otherwise SMART is just going to tell you 
the disk just died when it dies.

I'm not saying this is relevant to the current circumstance. But since 
you didn't mention a testing schedule I figured it bared a mention

>> Have you tried segregating some of your system memory for to make
>> sure that you aren't actually having application performance issues?
>
> The system isn't running out of memory; as I say, about the only userspace
> processes running are ssh, my shell, and rsync.

The thing with "movablecore=" will not lead to an "out of memory" 
condition or not, its a question of cache and buffer evictions.

I figured that you'd have said something about actual out of memory errors.

But here's the thing.

Once storage pressure gets "high enough" the system will start 
forgetting things intermittently to make room for other things. One of 
the things it will "forget" is pages of code from running programs. The 
other thing it can "forget" is dirent (directory entries) relevant to 
ongoing activity.

The real killer can involve "swappiness" (e.g. /proc/sys/vm/swapiness :: 
the tendency of the system to drop pages of program code, do not adjust 
this till you understand it fully) and overall page fault rates on the 
system. You'll start geting evictions long before you start using _any_ 
swap file space.

So if your effective throughput is low, the first thing to really look 
at is if your page fault rates are rising.  Variations of sar, ps, and 
top may be able to tell you about the current system and/or per-process 
page fault rates. You'll have to compare your distro's tool set to the 
procedures you can find online.

It's a little pernicious because it's a silent performance drain. There 
are no system messages to tell you "uh, hey dude, I'm doing a lot of 
reclaims lately and even going back to disk for pages of this program 
you really like". You just have to know how to look in that area.

>
> However, your first suggestion caused me to slap myself:
>
>> Have you tried increasing the number of stripe buffers for the
>> filesystem?
>
> This I had totally forgotten.  When I bump up the stripe cache size, it
> *seems* (so far, at least) to eliminate the slowest performance I'm seeing -
> specifically, the periods I've been seeing where no I/O at all seems to
> happen, plus the long runs of 1-3MB/s.  The copy is now staying pretty much in
> the 22-27MB/s range.
>
> That's not as fast as the hardware is capable of - as I say, with other
> filesystems on the same hardware, I can easily see 100+MB/s - but it's much
> better than it was.
>
> Is this remaining difference (25 vs 100+ MB/s) simply due to btrfs not being
> tuned for performance yet, or is there something else I'm probably
> overlooking?

I find BTRFS can be a little slow on my laptop, but I blame memory 
pressure evicting important structures somewhat system wide. Which is 
part of why I did the moveablecore= parametric tuning. I don't think 
there is anything that will pack the locality of the various trees, so 
you can end up needing bits of things from all over your disk in order 
to sequentially resolve a large directory and compute the running 
checksums for rsync (etc.).

Simple rule of thumb, if "wait for I/O time" has started to rise you've 
got some odd memory pressure that's sending you to idle land. It's not 
hard-and-fast as a rule, but since you've said that your CPU load (wich 
I'm taking to be the  user+system time) is staying low you are likely 
waiting for something.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-21 21:32     ` Robert White
@ 2014-12-21 22:53       ` Charles Cazabon
  2014-12-22  0:38         ` Robert White
  2014-12-22 14:16         ` Austin S Hemmelgarn
  2014-12-22  2:13       ` Satoru Takeuchi
  1 sibling, 2 replies; 16+ messages in thread
From: Charles Cazabon @ 2014-12-21 22:53 UTC (permalink / raw)
  To: btrfs list

Hi, Robert,

My performance issues with btrfs are more-or-less resolved now -- the
performance under btrfs still seems quite variable compared to other
filesystems -- my rsync speed is now varying between 40MB and ~90MB/s, with
occasional intervals where it drops further, down into the 10-20MB/s range.
Still no disk errors or SMART warnings that would indicate that problem is at
the hardware level.

> Do make sure you are regularly running the long "offline" test.

Ok, I'll do that.

> Otherwise SMART is just going to tell you the disk just died when it dies.

Ya, I'm aware of how limited/useful the SMART diagnostics are.  I'm also
paranoid enough to be using RAID 6...

> The thing with "movablecore=" will not lead to an "out of memory"
> condition or not, its a question of cache and buffer evictions.

I'm fairly certain memory isn't the issue here.  For what it's worth:

%Cpu(s):  2.1 us, 19.4 sy,  0.0 ni, 78.0 id,  0.2 wa,  0.3 hi,  0.0 si,  0.0 st
KiB Mem:  16469880 total, 16301252 used,   168628 free,      720 buffers
KiB Swap:  7811068 total,        0 used,  7811068 free, 15146580 cached

Swappiness I've left at the default of 60, but I'm not seeing swapping going
on regardless.

> > Is this remaining difference (25 vs 100+ MB/s) simply due to btrfs not being
> > tuned for performance yet

I found the cause of this.  Stupidly enough, there was a bwlimit set up in a
shell alias for rsync.

So btrfs is not nearly as slow as I was seeing.  It's still slower than
reading from an ext4 or XFS filesystem on these disks, but the absolute level
of read speed seems reasonable enough given that btrfs has not been under
heavy performance tuning to date.  My only remaining concern would be the
variability I still see in the read speed.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-21 22:53       ` Charles Cazabon
@ 2014-12-22  0:38         ` Robert White
  2014-12-25  3:14           ` Charles Cazabon
  2014-12-22 14:16         ` Austin S Hemmelgarn
  1 sibling, 1 reply; 16+ messages in thread
From: Robert White @ 2014-12-22  0:38 UTC (permalink / raw)
  To: btrfs list

On 12/21/2014 02:53 PM, Charles Cazabon wrote:
> Hi, Robert,
>
> My performance issues with btrfs are more-or-less resolved now -- the
> performance under btrfs still seems quite variable compared to other
> filesystems -- my rsync speed is now varying between 40MB and ~90MB/s, with
> occasional intervals where it drops further, down into the 10-20MB/s range.
> Still no disk errors or SMART warnings that would indicate that problem is at
> the hardware level.
>
>> Do make sure you are regularly running the long "offline" test.
>
> Ok, I'll do that.
>
>> Otherwise SMART is just going to tell you the disk just died when it dies.
>
> Ya, I'm aware of how limited/useful the SMART diagnostics are.  I'm also
> paranoid enough to be using RAID 6...
>
>> The thing with "movablecore=" will not lead to an "out of memory"
>> condition or not, its a question of cache and buffer evictions.
>
> I'm fairly certain memory isn't the issue here.  For what it's worth:
>
> %Cpu(s):  2.1 us, 19.4 sy,  0.0 ni, 78.0 id,  0.2 wa,  0.3 hi,  0.0 si,  0.0 st
> KiB Mem:  16469880 total, 16301252 used,   168628 free,      720 buffers
> KiB Swap:  7811068 total,        0 used,  7811068 free, 15146580 cached
>
> Swappiness I've left at the default of 60, but I'm not seeing swapping going
> on regardless.

Swappiness has nothing to do with swapping.

You have very little free memory.

Here is how a linux system runs a program. When exec() is called the 
current process memory is wiped (dropped, forgotten, whatever) [except 
for a few things like the open file discriptor table]. Then the 
executable is opened and mmap() is called to map the text portions of 
that executable into memory. This does not involve any particular 
reading of that file. The dynamic linker also selectively mmap()s the 
needed libraries. So you end up with something that looks like this:

[+.....................]

Most of the program is not actually in memory. (the "." parts), and some 
minimum part is in memory (the "+" part).

As you use the program more of it will work its way into memory.

[+++.....+.+....++....]

swappiness controls the likelihood that memory pressure will cause parts 
that have been read in to be "forgotten" based on the idea that it can 
be read again later if needed.

[+++.....+.+....++....]
[+.......+......+.....]

This is called "demand paging", and because linux uses ELF (extensible 
link format) and all programs run in a uniform memory map, program text 
_never_ needs to be written to swap space. Windows DLL/EXE has to 
"relocate" the code, e.g. re-write it to make it runable. So on windows 
code text has to be paged to swap.

So what is "swapping" and swap space for?

Well next to the code is the data.

[+++.....+.+....++....] {.............}

As it gets written to, it cannot just be forgotten because the program 
needs that data or it wouldn't have written it.

[+++.....+.+....++....] {..******.*...}

So if the system needs to reclaim the memory used by that kind of data 
it sends it to the swap space.

[+++.....+.+....++....] {..******.*...}
                       ^^^^ swapping vvvv
[+++.....+.+....++....] {..**...*.*...}

Swappiness is how you tel the system that you want to keep the code 
("+") in memory in favor of the data ("*").

But _long_ before you start having to actually write the data to disk in 
the swap space, the operating system will start casually forgetting the 
code. Most of both of these things, while "freshly forgotten" can be 
"reclaimed" from the disk/page cache.

So when you show me that free memory listing what I see is someone who 
is bumping against their "minimum desired free memory" limit (e.g. about 
1% free ram) and so has a system where it's _possible_ that a good bit 
of stuff is getting dumped out of the active page tables and into the 
disk/page cache where it could start bogging down the system in large 
numbers of short reclaim delays and potentially non-trivial amounts of 
demand paging.

So not "out of memory" but not ideal. Real measurements of page fault 
activity and wait for io time needs to be done to determine if more 
action needs to be taken.

Compare that to my laptop where I've deliberately made sure that memory 
is always available for fast transient use.

Gust ~ # free
               total        used        free      shared  buff/cache 
available
Mem:        7915804     2820368     2419300       47472     2676136 
4790892
Swap:       8778748      311688     8467060

In this configuration when I run bursty stuff I've set aside two gig 
that sits around "Free" into which the dynamic load can find ample 
space. (I am not recommending that for you necessarily, were I present 
I'd do some experimenting). But that dynamic space is where the rsync 
would be doing its work and so be less likely to stall. (Etc).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-22  0:38         ` Robert White
@ 2014-12-25  3:14           ` Charles Cazabon
  0 siblings, 0 replies; 16+ messages in thread
From: Charles Cazabon @ 2014-12-25  3:14 UTC (permalink / raw)
  To: btrfs list

Robert White <rwhite@pobox.com> wrote:
> 
> You have very little free memory.

I think you're mistaken.  Every diagnostic I've looked at says the opposite.
>From 30 seconds ago on the same machine, after unmounting the big btrfs
filesystem (and with a larger xfs one mounted), /proc/meminfo says almost the
entirety of the machine's 16GB is free:

  MemTotal:       16469880 kB
  MemFree:        16005392 kB
  MemAvailable:   15974244 kB
  Buffers:              84 kB
  [...]

> This is called "demand paging",

Yes, I'm aware of how this works.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-21 22:53       ` Charles Cazabon
  2014-12-22  0:38         ` Robert White
@ 2014-12-22 14:16         ` Austin S Hemmelgarn
  2014-12-25  3:15           ` Charles Cazabon
  1 sibling, 1 reply; 16+ messages in thread
From: Austin S Hemmelgarn @ 2014-12-22 14:16 UTC (permalink / raw)
  To: btrfs list

[-- Attachment #1: Type: text/plain, Size: 2805 bytes --]

On 2014-12-21 17:53, Charles Cazabon wrote:
> Hi, Robert,
>
> My performance issues with btrfs are more-or-less resolved now -- the
> performance under btrfs still seems quite variable compared to other
> filesystems -- my rsync speed is now varying between 40MB and ~90MB/s, with
> occasional intervals where it drops further, down into the 10-20MB/s range.
> Still no disk errors or SMART warnings that would indicate that problem is at
> the hardware level.
>
>> Do make sure you are regularly running the long "offline" test.
>
> Ok, I'll do that.
>
>> Otherwise SMART is just going to tell you the disk just died when it dies.
>
> Ya, I'm aware of how limited/useful the SMART diagnostics are.  I'm also
> paranoid enough to be using RAID 6...
>
>> The thing with "movablecore=" will not lead to an "out of memory"
>> condition or not, its a question of cache and buffer evictions.
>
> I'm fairly certain memory isn't the issue here.  For what it's worth:
>
> %Cpu(s):  2.1 us, 19.4 sy,  0.0 ni, 78.0 id,  0.2 wa,  0.3 hi,  0.0 si,  0.0 st
> KiB Mem:  16469880 total, 16301252 used,   168628 free,      720 buffers
> KiB Swap:  7811068 total,        0 used,  7811068 free, 15146580 cached
>
> Swappiness I've left at the default of 60, but I'm not seeing swapping going
> on regardless.
>
>>> Is this remaining difference (25 vs 100+ MB/s) simply due to btrfs not being
>>> tuned for performance yet
>
> I found the cause of this.  Stupidly enough, there was a bwlimit set up in a
> shell alias for rsync.
>
> So btrfs is not nearly as slow as I was seeing.  It's still slower than
> reading from an ext4 or XFS filesystem on these disks, but the absolute level
> of read speed seems reasonable enough given that btrfs has not been under
> heavy performance tuning to date.  My only remaining concern would be the
> variability I still see in the read speed.
This actually sounds kind of like the issues I have sometimes on my 
laptop using btrfs on an SSD, I've mostly resolved them by tuning IO 
scheduler parameters, as the default IO scheduler (the supposedly 
Completely Fair Queue, which was obviously named by a mathematician who 
had never actually run the algorithm) has some pretty brain-dead default 
settings.  The other thing I would suggest looking into regarding the 
variability is tuning the kernel's write-caching settings, with the 
defaults you're caching ~1.6G worth of writes before it forces 
write-back, which is a ridiculous amount;  I've that the highest value 
that is actually usable is about 256M, and that's only if you are doing 
mostly bursty IO and not the throughput focused stuff that rsync does, 
I'd say try setting /proc/sys/vm/dirty_background_bytes to 67108864 
(64M) and see if that helps things some.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-22 14:16         ` Austin S Hemmelgarn
@ 2014-12-25  3:15           ` Charles Cazabon
  0 siblings, 0 replies; 16+ messages in thread
From: Charles Cazabon @ 2014-12-25  3:15 UTC (permalink / raw)
  To: btrfs list

Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:
>
> This actually sounds kind of like the issues I have sometimes on my
> laptop using btrfs on an SSD, I've mostly resolved them by tuning IO
> scheduler parameters, as the default IO scheduler (the supposedly
> Completely Fair Queue, which was obviously named by a mathematician
> who had never actually run the algorithm) has some pretty brain-dead
> default settings.  The other thing I would suggest looking into
> regarding the variability is tuning the kernel's write-caching
> settings

Ok, that's something I will examine.  I knew CFQ is completely wrong for SSD
use, but I thought it was still one of the better schedulers for spinning
disks.  Apparently that may not be the case.

Thanks,

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-21 21:32     ` Robert White
  2014-12-21 22:53       ` Charles Cazabon
@ 2014-12-22  2:13       ` Satoru Takeuchi
  2014-12-25  3:18         ` Charles Cazabon
  1 sibling, 1 reply; 16+ messages in thread
From: Satoru Takeuchi @ 2014-12-22  2:13 UTC (permalink / raw)
  To: Robert White, btrfs list

Hi,

On 2014/12/22 6:32, Robert White wrote:
> On 12/21/2014 08:32 AM, Charles Cazabon wrote:
>> Hi, Robert,
>>
>> Thanks for the response.  Many of the things you mentioned I have tried, but
>> for completeness:
>>
>>> Have you taken SMART (smartmotools etc) to these disks
>> There are no errors or warnings from SMART for the disks.
>
>
> Do make sure you are regularly running the long "offline" test. [offline is a bad name, what it really should be called is the long idle-interval test. sigh] about once a week. Otherwise SMART is just going to tell you the disk just died when it dies.
>
> I'm not saying this is relevant to the current circumstance. But since you didn't mention a testing schedule I figured it bared a mention
>
>>> Have you tried segregating some of your system memory for to make
>>> sure that you aren't actually having application performance issues?
>>
>> The system isn't running out of memory; as I say, about the only userspace
>> processes running are ssh, my shell, and rsync.
>
> The thing with "movablecore=" will not lead to an "out of memory" condition or not, its a question of cache and buffer evictions.
>
> I figured that you'd have said something about actual out of memory errors.
>
> But here's the thing.
>
> Once storage pressure gets "high enough" the system will start forgetting things intermittently to make room for other things. One of the things it will "forget" is pages of code from running programs. The other thing it can "forget" is dirent (directory entries) relevant to ongoing activity.
>
> The real killer can involve "swappiness" (e.g. /proc/sys/vm/swapiness :: the tendency of the system to drop pages of program code, do not adjust this till you understand it fully) and overall page fault rates on the system. You'll start geting evictions long before you start using _any_ swap file space.
>
> So if your effective throughput is low, the first thing to really look at is if your page fault rates are rising.  Variations of sar, ps, and top may be able to tell you about the current system and/or per-process page fault rates. You'll have to compare your distro's tool set to the procedures you can find online.
>
> It's a little pernicious because it's a silent performance drain. There are no system messages to tell you "uh, hey dude, I'm doing a lot of reclaims lately and even going back to disk for pages of this program you really like". You just have to know how to look in that area.
>
>>
>> However, your first suggestion caused me to slap myself:
>>
>>> Have you tried increasing the number of stripe buffers for the
>>> filesystem?
>>
>> This I had totally forgotten.  When I bump up the stripe cache size, it
>> *seems* (so far, at least) to eliminate the slowest performance I'm seeing -
>> specifically, the periods I've been seeing where no I/O at all seems to
>> happen, plus the long runs of 1-3MB/s.  The copy is now staying pretty much in
>> the 22-27MB/s range.
>>
>> That's not as fast as the hardware is capable of - as I say, with other
>> filesystems on the same hardware, I can easily see 100+MB/s - but it's much
>> better than it was.
>>
>> Is this remaining difference (25 vs 100+ MB/s) simply due to btrfs not being
>> tuned for performance yet, or is there something else I'm probably
>> overlooking?
>
>
> I find BTRFS can be a little slow on my laptop, but I blame memory pressure evicting important structures somewhat system wide. Which is part of why I did the moveablecore= parametric tuning. I don't think there is anything that will pack the locality of the various trees, so you can end up needing bits of things from all over your disk in order to sequentially resolve a large directory and compute the running checksums for rsync (etc.).
>
> Simple rule of thumb, if "wait for I/O time" has started to rise you've got some odd memory pressure that's sending you to idle land. It's not hard-and-fast as a rule, but since you've said that your CPU load (wich I'm taking to be the  user+system time) is staying low you are likely waiting for something.

Capturing "echo t >/proc/magic-sysrq" on waiting for I/O may help you.
It shows us where kernel actually be waiting for.

In addition, to confirm whether this problem is caused only by
Btrfs or not, the following way can be used.

  1. preparing the extra storage,
  2. copy Btrfs's data into int by dd if=<LVM volume> of=<extra storage>
  3. Use it and confirm whether this problem still happen or not

However, since the size of your Btrfs is quite large, I guess you
can't do it. If you have such extra storage, you've already
embed it to Btrfs.

Thanks,
Satoru

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Oddly slow read performance with near-full largish FS
  2014-12-22  2:13       ` Satoru Takeuchi
@ 2014-12-25  3:18         ` Charles Cazabon
  0 siblings, 0 replies; 16+ messages in thread
From: Charles Cazabon @ 2014-12-25  3:18 UTC (permalink / raw)
  To: btrfs list

Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> wrote:
> 
> In addition, to confirm whether this problem is caused only by
> Btrfs or not, the following way can be used.
> 
>  1. preparing the extra storage,
>  2. copy Btrfs's data into int by dd if=<LVM volume> of=<extra storage>
>  3. Use it and confirm whether this problem still happen or not

I've already copied the ~16TB of data from the btrfs filesystem to an XFS
filesystem.  I do not see the performance variability under xfs that I see
under btrfs.

> However, since the size of your Btrfs is quite large, I guess you
> can't do it. If you have such extra storage, you've already
> embed it to Btrfs.

Actually, I decided to move to xfs, at least for now.  Apparently not many
people are using btrfs with filesystems >15TB, so it seems I'm in more-or-less
uncharted territory, at least according to the responses I've gotten when
looking into this issue.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-12-25  3:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-17  2:42 Oddly slow read performance with near-full largish FS Charles Cazabon
2014-12-19  8:58 ` Satoru Takeuchi
2014-12-19 16:58   ` Charles Cazabon
2014-12-19 17:33     ` Duncan
2014-12-20  8:53       ` Chris Murphy
2014-12-20 10:03       ` Robert White
2014-12-20 10:57 ` Robert White
2014-12-21 16:32   ` Charles Cazabon
2014-12-21 21:32     ` Robert White
2014-12-21 22:53       ` Charles Cazabon
2014-12-22  0:38         ` Robert White
2014-12-25  3:14           ` Charles Cazabon
2014-12-22 14:16         ` Austin S Hemmelgarn
2014-12-25  3:15           ` Charles Cazabon
2014-12-22  2:13       ` Satoru Takeuchi
2014-12-25  3:18         ` Charles Cazabon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).