linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell Coker <russell@coker.com.au>
To: Igor M <igork20@gmail.com>
Cc: Duncan <1i5t5.duncan@cox.net>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Very slow filesystem
Date: Thu, 05 Jun 2014 20:54:27 +1000	[thread overview]
Message-ID: <1967972.kDBB8dC4mO@xev> (raw)
In-Reply-To: <CAEezp7sahY+hcKpu=aLA0T81UWjujRoxP465jBmQrr31ESB2qg@mail.gmail.com>

On Thu, 5 Jun 2014 09:50:53 Igor M wrote:
> But data to this big tables is only appended, it's never deleted. So
> no rewrites should be happening.

When you write to the big tables the indexes will be rewritten.  Indexes can 
be in the same file as table data or as separate files depending on what data 
base you use.  For the former you get fragmented table files and for the 
latter 70G of data will have index files that are large enough to get 
fragmented.

Also when you have multiple files in a filesystem being written at the same 
time (EG multiple tables appended to in each transaction) then you will get 
some fragmentation.  Add COW and that makes a lot of fragmentation.

Finally append is done at the file level while COW is rewriting at the block 
level.  If your database rounds up the allocated space to some power of 2 
larger than 4K then things will be fine for a filesystem like Ext3 where file 
offsets correspond to fixed locations on disk.  But with BTRFS that pre-
allocated space filled with zeros will be rewritten to a different part of 
disk when the allocated space is used.

If you use a database that doesn't preallocate space then COW will be invoked 
when the end of the file at an offset that isn't a multiple of 4K (or I think 
16K for a BTRFS filesystem created with a recent mkfs.btrfs) is written as 
appending to data within a block offset means rewriting the block.

I believe that COW is desirable for a database.  I don't believe that a lack 
of integrity at the filesystem level will help integrity at the database 
level.  If the working set of your database can fit in RAM then you can rely 
on cache to ensure that little data is read during operation.  For example one 
of my database servers has been running for 330 days and the /mysql filesystem 
has writes outnumbering reads by a factor of 3:1.  When most IO is for writes 
fragmentation of data is less of an issue - although in this case the server 
is running Ext3 so it wouldn't get the COW fragmentation issues.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


  reply	other threads:[~2014-06-05 10:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-04 22:15 Very slow filesystem Igor M
2014-06-04 22:27 ` Fajar A. Nugraha
2014-06-04 22:40   ` Roman Mamedov
2014-06-04 22:45   ` Igor M
2014-06-04 23:17     ` Timofey Titovets
2014-06-05  3:05 ` Duncan
2014-06-05  3:22   ` Fajar A. Nugraha
2014-06-05  4:45     ` Duncan
2014-06-05  7:50   ` Igor M
2014-06-05 10:54     ` Russell Coker [this message]
2014-06-05 15:52   ` Igor M
2014-06-05 16:13     ` Timofey Titovets
2014-06-05 19:53       ` Duncan
2014-06-06 19:06         ` Mitch Harder
2014-06-06 19:59           ` Duncan
2014-06-07  2:29           ` Russell Coker
2014-06-05  8:08 ` Erkki Seppala
2014-06-05  8:12   ` Erkki Seppala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1967972.kDBB8dC4mO@xev \
    --to=russell@coker.com.au \
    --cc=1i5t5.duncan@cox.net \
    --cc=igork20@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).