linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell Coker <russell@coker.com.au>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Very slow filesystem
Date: Sat, 07 Jun 2014 12:29:05 +1000	[thread overview]
Message-ID: <1748350.S3klgLzlBt@xev> (raw)
In-Reply-To: <CAKcLGm8fFX-Hqz0iEgk2qtsUXrSNW=XsKMHiVWZvV4AoJCBGVQ@mail.gmail.com>

On Fri, 6 Jun 2014 14:06:53 Mitch Harder wrote:
> Every time you update your database, btrfs is going to update
> whichever 128 KiB blocks need to be modified.
> 
> Even for a tiny modification, the new compressed block may be slightly
> more or slightly less than 128 KiB.
> 
> If you have a 1-2 GB database that is being updated with any
> frequency, you can see how you will quickly end up with lots of
> metadata fragmentation as well as inefficient data block utilization.
> I think this will be the case even if you switch to NOCOW due to the
> compression.
> 
> On a very fundamental level, file system compression and large
> databases are two use cases that are difficult to reconcile.

The ZFS approach of using a ZIL (write-back cache that caches before 
allocation) and L2ARC (read-cache on SSD) mitigates these problems.  Samsung 
1TB SSDs are $565 at my local computer store, if your database has a working 
set of less than 2TB then SSDs with L2ARC should solve those performance 
problems at low cost.  The vast majority of sysadmins have never seen a 
database that's 2TB in size, let alone one with a 2TB working set.

That said I've seen Oracle docs recommending against ZFS for large databases, 
but the Oracle definition of "large database" is probably a lot larger than 
anything that is likely to be stored on BTRFS in the near future.

Another thing to note is that there are a variety of ways of storing 
compressed data in databases.  Presumably anyone who is storing so much data 
that the working set exceeds the ability to attach lots of SSDs is going to be 
using some form of compressed tables which will reduce the ability of 
filesystem compression to do any good.

On Fri, 6 Jun 2014 19:59:55 Duncan wrote:
> Similarly for checksumming.  When there are enough updates, in addition 
> to taking more time to calculate and write, checksumming simply invites 
> race conditions between the last then-valid checksum and the next update 
> invalidating it.  In addition, in many, perhaps most cases, the sorts of 
> apps that do constant internal updates, have already evolved their own 
> data integrity verification methods in ordered to cope with issues on the 
> after all way more common unverified filesystems, creating even more 
> possible race conditions and timing issues and making all that extra work 
> that btrfs normally does for verification unnecessary.  Trying to do all 
> that in-place due to NOCOW is a recipe for failure or insanity if not both

http://www.strchr.com/crc32_popcnt

The above URL has some interesting information about CRC32 speed.  In summary 
if you have a Core i5 system then you are looking at less than a clock cycle 
per byte on average.  So if your storage is capable of handling more than 
4GB/s of data transfer then CRC32 might be a bottleneck.  But doing 4GB/s for 
a database is a very different problem.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


  parent reply	other threads:[~2014-06-07  2:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-04 22:15 Very slow filesystem Igor M
2014-06-04 22:27 ` Fajar A. Nugraha
2014-06-04 22:40   ` Roman Mamedov
2014-06-04 22:45   ` Igor M
2014-06-04 23:17     ` Timofey Titovets
2014-06-05  3:05 ` Duncan
2014-06-05  3:22   ` Fajar A. Nugraha
2014-06-05  4:45     ` Duncan
2014-06-05  7:50   ` Igor M
2014-06-05 10:54     ` Russell Coker
2014-06-05 15:52   ` Igor M
2014-06-05 16:13     ` Timofey Titovets
2014-06-05 19:53       ` Duncan
2014-06-06 19:06         ` Mitch Harder
2014-06-06 19:59           ` Duncan
2014-06-07  2:29           ` Russell Coker [this message]
2014-06-05  8:08 ` Erkki Seppala
2014-06-05  8:12   ` Erkki Seppala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1748350.S3klgLzlBt@xev \
    --to=russell@coker.com.au \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).