CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs?

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs?
@ 2015-06-15  9:34 Ingvar Bogdahn
  2015-06-15  9:57 ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Ingvar Bogdahn @ 2015-06-15  9:34 UTC (permalink / raw)
  To: linux-btrfs

Hello there,

I'm planing to use btrfs for a medium-sized webserver. It is commonly 
recommended to set nodatacow for database files to avoid performance 
degradation. However, apparently nodatacow disables some of my main 
motivations of using btrfs : checksumming and (probably) incremental 
backups with send/receive (please correct me if I'm wrong on this). 
Also, the databases are among the most important data on my webserver, 
so it is particularly there that I would like those feature working.

My question is, are there strategies to avoid nodatacow of databases 
that are suitable and safe in a production server?
I thought about the following:
- in mysql/mariadb: setting "innodb_file_per_table" should avoid having 
few very big database files.
- in mysql/mariadb: adapting database schema to store blobs into 
dedicated tables.
- btrfs: set autodefrag or some cron job to regularly defrag only 
database fails to avoid performance degradation due to fragmentation
- turn on compression on either btrfs or mariadb

Is this likely to give me ok-ish performance? What other possibilities 
are there?

Thanks for your recommendations.

ingvar

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs?
  2015-06-15  9:34 CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs? Ingvar Bogdahn
@ 2015-06-15  9:57 ` Hugo Mills
  2015-06-16  7:06   ` Ingvar Bogdahn
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2015-06-15  9:57 UTC (permalink / raw)
  To: Ingvar Bogdahn; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2495 bytes --]

On Mon, Jun 15, 2015 at 11:34:35AM +0200, Ingvar Bogdahn wrote:
> Hello there,
> 
> I'm planing to use btrfs for a medium-sized webserver. It is
> commonly recommended to set nodatacow for database files to avoid
> performance degradation. However, apparently nodatacow disables some
> of my main motivations of using btrfs : checksumming and (probably)
> incremental backups with send/receive (please correct me if I'm
> wrong on this). Also, the databases are among the most important
> data on my webserver, so it is particularly there that I would like
> those feature working.
> 
> My question is, are there strategies to avoid nodatacow of databases
> that are suitable and safe in a production server?
> I thought about the following:
> - in mysql/mariadb: setting "innodb_file_per_table" should avoid
> having few very big database files.

   It's not so much about the overall size of the files, but about the
write patterns, so this probably won't be useful.

> - in mysql/mariadb: adapting database schema to store blobs into
> dedicated tables.

   Probably not an issue -- each BLOB is (likely) to be written in a
single unit, which won't cause the fragmentation problems.

> - btrfs: set autodefrag or some cron job to regularly defrag only
> database fails to avoid performance degradation due to fragmentation

   Autodefrag is a good idea, and I would suggest trying that first,
before anything else, to see if it gives you good enough performance
over time.

   Running an explicit defrag will break any CoW copies you have (like
snapshots), causing them to take up additional space. For example,
start with a 10 GB subvolume. Snapshot it, and you will still only
have 10 GB of disk usage. Defrag one (or both) copies, and you'll
suddenly be using 20 GB.

> - turn on compression on either btrfs or mariadb

   Again, won't help. The issue is not the size of the data, it's the
write patterns: small random writes into the middle of existing files
will eventually cause those files to fragment, which causes lots of
seeks and short reads, which degrades performance.

> Is this likely to give me ok-ish performance? What other
> possibilities are there?

   I would recommend benchmarking over time with your workloads, and
seeing how your performance degrades.

   Hugo.

-- 
Hugo Mills             | You are not stuck in traffic: you are traffic
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                    German ad campaign

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs?
  2015-06-15  9:57 ` Hugo Mills
@ 2015-06-16  7:06   ` Ingvar Bogdahn
  2015-06-16  8:49     ` Fajar A. Nugraha
  2015-06-16  9:32     ` Hugo Mills
  0 siblings, 2 replies; 5+ messages in thread
From: Ingvar Bogdahn @ 2015-06-16  7:06 UTC (permalink / raw)
  To: Ingvar Bogdahn, linux-btrfs

Hi again,

Benchmarking over time seems a good idea, but what if I see that a 
particular database does indeed degrade in performance? How can I then 
selectively improve performance for that file, since disabling cow only 
works for new empty files?

Is it correct that bundling small random writes into groups of writes 
reduces fragmentation? If so, some form of write-caching should help? 
I'm still investigating, but one solution might be:
1) identify which exact tables do have frequent writes
2) decrease the system-wide write-caching (vm.dirty_background_ratio and 
vm.dirty_ratio) to lower levels, because this wastes lots of RAM by 
indiscriminately caching writes of the whole system, and tends to causes 
spikes where suddenly the entire cache gets written to disk and block 
the system. Rather use that RAM selectively to cache only the critical 
files.
4) create a software RAID-1 made up of a ramdisk and a mounted image, 
using mdadm.
5) Setting up mdadm using rather large value for "write-behind="
6) put only those tables on that disk-backed ramdisk which do have 
frequent writes.

What do you think?

Ingvar



Am 15.06.15 um 11:57 schrieb Hugo Mills:
> On Mon, Jun 15, 2015 at 11:34:35AM +0200, Ingvar Bogdahn wrote:
>> Hello there,
>>
>> I'm planing to use btrfs for a medium-sized webserver. It is
>> commonly recommended to set nodatacow for database files to avoid
>> performance degradation. However, apparently nodatacow disables some
>> of my main motivations of using btrfs : checksumming and (probably)
>> incremental backups with send/receive (please correct me if I'm
>> wrong on this). Also, the databases are among the most important
>> data on my webserver, so it is particularly there that I would like
>> those feature working.
>>
>> My question is, are there strategies to avoid nodatacow of databases
>> that are suitable and safe in a production server?
>> I thought about the following:
>> - in mysql/mariadb: setting "innodb_file_per_table" should avoid
>> having few very big database files.
>     It's not so much about the overall size of the files, but about the
> write patterns, so this probably won't be useful.
>
>> - in mysql/mariadb: adapting database schema to store blobs into
>> dedicated tables.
>     Probably not an issue -- each BLOB is (likely) to be written in a
> single unit, which won't cause the fragmentation problems.
>
>> - btrfs: set autodefrag or some cron job to regularly defrag only
>> database fails to avoid performance degradation due to fragmentation
>     Autodefrag is a good idea, and I would suggest trying that first,
> before anything else, to see if it gives you good enough performance
> over time.
>
>     Running an explicit defrag will break any CoW copies you have (like
> snapshots), causing them to take up additional space. For example,
> start with a 10 GB subvolume. Snapshot it, and you will still only
> have 10 GB of disk usage. Defrag one (or both) copies, and you'll
> suddenly be using 20 GB.
>
>> - turn on compression on either btrfs or mariadb
>     Again, won't help. The issue is not the size of the data, it's the
> write patterns: small random writes into the middle of existing files
> will eventually cause those files to fragment, which causes lots of
> seeks and short reads, which degrades performance.
>
>> Is this likely to give me ok-ish performance? What other
>> possibilities are there?
>     I would recommend benchmarking over time with your workloads, and
> seeing how your performance degrades.
>
>     Hugo.
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs?
  2015-06-16  7:06   ` Ingvar Bogdahn
@ 2015-06-16  8:49     ` Fajar A. Nugraha
  2015-06-16  9:32     ` Hugo Mills
  1 sibling, 0 replies; 5+ messages in thread
From: Fajar A. Nugraha @ 2015-06-16  8:49 UTC (permalink / raw)
  To: Ingvar Bogdahn; +Cc: linux-btrfs

On Tue, Jun 16, 2015 at 2:06 PM, Ingvar Bogdahn
<ingvar.bogdahn@googlemail.com> wrote:
> Hi again,
>
> Benchmarking over time seems a good idea, but what if I see that a
> particular database does indeed degrade in performance? How can I then
> selectively improve performance for that file, since disabling cow only
> works for new empty files?
>

you might be overcomplicating things.

> Is it correct that bundling small random writes into groups of writes
> reduces fragmentation? If so, some form of write-caching should help? I'm
> still investigating, but one solution might be:
> 1) identify which exact tables do have frequent writes
> 2) decrease the system-wide write-caching (vm.dirty_background_ratio and
> vm.dirty_ratio) to lower levels, because this wastes lots of RAM by
> indiscriminately caching writes of the whole system, and tends to causes
> spikes where suddenly the entire cache gets written to disk and block the
> system. Rather use that RAM selectively to cache only the critical files.

IIRC innodb uses O_DIRECT by default, which should bypass fs cache, so
the above should be irrelevant


> 4) create a software RAID-1 made up of a ramdisk and a mounted image, using
> mdadm.
> 5) Setting up mdadm using rather large value for "write-behind="
> 6) put only those tables on that disk-backed ramdisk which do have frequent
> writes.
>

raid1 writes everything to both, so your write performance would still
be limited by the disk.
As for reads, instead of using ramdisk for half of md, I would just
use that amount of ram for innodb_buffer_pool


> What do you think?


I would say "determine your priorities".

If you absolutely need btrfs + innodb, then I would:
- increase innodb_buffer_pool
- don't mess with nocow, leave it as is
- don't mess with autodefrag
- enable compression on btrfs
- use latest known good kernel (AFAIK 4.0.5 should be good)

If you absolutely must have high performance with innodb, then I would
look at using raw block device directly for innodb. You'd lose all
btrfs features of course (e.g. snapshots), but it's a tradeoff for
performance.

If you don't HAVE to use innodb but still want to use btrfs, then I
would use tokudb engine instead (available in tokudb's mysql fork and
mariadb >= 10), with compression handled by tokudb (disable
compression in btrfs). tokudb doesn't support foreign constraint, but
other than that it should be able to replace innodb for your purposes.
Among other things, tokudb uses larger block size (4MB) so it should
help reduce fragmentation compared to innodb.

If you don't HAVE to use either btrfs or innodb, but just want "mysql
db that supports transactions with an fs that supports
snapshot/clone", then I would use zfs + tokudb. And read
http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/ (with the
exception that compression should be used in tokudb instead of zfs)

-- 
Fajar

>
> Ingvar
>
>
>
> Am 15.06.15 um 11:57 schrieb Hugo Mills:
>
>> On Mon, Jun 15, 2015 at 11:34:35AM +0200, Ingvar Bogdahn wrote:
>>>
>>> Hello there,
>>>
>>> I'm planing to use btrfs for a medium-sized webserver. It is
>>> commonly recommended to set nodatacow for database files to avoid
>>> performance degradation. However, apparently nodatacow disables some
>>> of my main motivations of using btrfs : checksumming and (probably)
>>> incremental backups with send/receive (please correct me if I'm
>>> wrong on this). Also, the databases are among the most important
>>> data on my webserver, so it is particularly there that I would like
>>> those feature working.
>>>
>>> My question is, are there strategies to avoid nodatacow of databases
>>> that are suitable and safe in a production server?
>>> I thought about the following:
>>> - in mysql/mariadb: setting "innodb_file_per_table" should avoid
>>> having few very big database files.
>>
>>     It's not so much about the overall size of the files, but about the
>> write patterns, so this probably won't be useful.
>>
>>> - in mysql/mariadb: adapting database schema to store blobs into
>>> dedicated tables.
>>
>>     Probably not an issue -- each BLOB is (likely) to be written in a
>> single unit, which won't cause the fragmentation problems.
>>
>>> - btrfs: set autodefrag or some cron job to regularly defrag only
>>> database fails to avoid performance degradation due to fragmentation
>>
>>     Autodefrag is a good idea, and I would suggest trying that first,
>> before anything else, to see if it gives you good enough performance
>> over time.
>>
>>     Running an explicit defrag will break any CoW copies you have (like
>> snapshots), causing them to take up additional space. For example,
>> start with a 10 GB subvolume. Snapshot it, and you will still only
>> have 10 GB of disk usage. Defrag one (or both) copies, and you'll
>> suddenly be using 20 GB.
>>
>>> - turn on compression on either btrfs or mariadb
>>
>>     Again, won't help. The issue is not the size of the data, it's the
>> write patterns: small random writes into the middle of existing files
>> will eventually cause those files to fragment, which causes lots of
>> seeks and short reads, which degrades performance.
>>
>>> Is this likely to give me ok-ish performance? What other
>>> possibilities are there?
>>
>>     I would recommend benchmarking over time with your workloads, and
>> seeing how your performance degrades.
>>
>>     Hugo.
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs?
  2015-06-16  7:06   ` Ingvar Bogdahn
  2015-06-16  8:49     ` Fajar A. Nugraha
@ 2015-06-16  9:32     ` Hugo Mills
  1 sibling, 0 replies; 5+ messages in thread
From: Hugo Mills @ 2015-06-16  9:32 UTC (permalink / raw)
  To: Ingvar Bogdahn; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5010 bytes --]

On Tue, Jun 16, 2015 at 09:06:56AM +0200, Ingvar Bogdahn wrote:
> Hi again,
> 
> Benchmarking over time seems a good idea, but what if I see that a
> particular database does indeed degrade in performance? How can I
> then selectively improve performance for that file, since disabling
> cow only works for new empty files?

# mv file file.bak
# touch file
# chattr +C file
# cat file.bak >file
# rm file.bak

> Is it correct that bundling small random writes into groups of
> writes reduces fragmentation? If so, some form of write-caching
> should help?

   No, that's unlikely to help -- you're still fragmenting the
original file. Imagine a disk with a file (AAAABAAAACAAAADAAAAEAAAA)
on it, and the blocks B, C, D, E being modified. On the disk, you
might then end up with:

...AAAA.AAAA.AAAA.AAAA.AAAA.......................EDCB....

   Reading this file sequentially is going to involve 8 long seeks,
which is the fundamental problem with fragmentation. This kind of
behaviour in a CoW filesystem is inevitable; the main question is how
to minimise it.

   autodefrag, as I understand it, looks for high levels of
fragmentation in the few blocks near a pending write, and reads and
rewrites all of those blocks in one go. (I haven't read the code -- this
is based on my understanding of some passing remarks from josef on IRC
a while ago, so I might well be mischaracterising it).

> I'm still investigating, but one solution might be:
> 1) identify which exact tables do have frequent writes
> 2) decrease the system-wide write-caching (vm.dirty_background_ratio
> and vm.dirty_ratio) to lower levels, because this wastes lots of RAM
> by indiscriminately caching writes of the whole system, and tends to
> causes spikes where suddenly the entire cache gets written to disk
> and block the system. Rather use that RAM selectively to cache only
> the critical files.
> 4) create a software RAID-1 made up of a ramdisk and a mounted
> image, using mdadm.
> 5) Setting up mdadm using rather large value for "write-behind="
> 6) put only those tables on that disk-backed ramdisk which do have
> frequent writes.
> 
> What do you think?

   Benchmark it. Also test it for reliability when you pull the power
out in the middle of a bunch of writes -- you're caching so much in an
ad-hoc manner that I think you're unlikely to be achieving the D part
of ACID.

   Hugo.

> Ingvar
> 
> 
> 
> Am 15.06.15 um 11:57 schrieb Hugo Mills:
> >On Mon, Jun 15, 2015 at 11:34:35AM +0200, Ingvar Bogdahn wrote:
> >>Hello there,
> >>
> >>I'm planing to use btrfs for a medium-sized webserver. It is
> >>commonly recommended to set nodatacow for database files to avoid
> >>performance degradation. However, apparently nodatacow disables some
> >>of my main motivations of using btrfs : checksumming and (probably)
> >>incremental backups with send/receive (please correct me if I'm
> >>wrong on this). Also, the databases are among the most important
> >>data on my webserver, so it is particularly there that I would like
> >>those feature working.
> >>
> >>My question is, are there strategies to avoid nodatacow of databases
> >>that are suitable and safe in a production server?
> >>I thought about the following:
> >>- in mysql/mariadb: setting "innodb_file_per_table" should avoid
> >>having few very big database files.
> >    It's not so much about the overall size of the files, but about the
> >write patterns, so this probably won't be useful.
> >
> >>- in mysql/mariadb: adapting database schema to store blobs into
> >>dedicated tables.
> >    Probably not an issue -- each BLOB is (likely) to be written in a
> >single unit, which won't cause the fragmentation problems.
> >
> >>- btrfs: set autodefrag or some cron job to regularly defrag only
> >>database fails to avoid performance degradation due to fragmentation
> >    Autodefrag is a good idea, and I would suggest trying that first,
> >before anything else, to see if it gives you good enough performance
> >over time.
> >
> >    Running an explicit defrag will break any CoW copies you have (like
> >snapshots), causing them to take up additional space. For example,
> >start with a 10 GB subvolume. Snapshot it, and you will still only
> >have 10 GB of disk usage. Defrag one (or both) copies, and you'll
> >suddenly be using 20 GB.
> >
> >>- turn on compression on either btrfs or mariadb
> >    Again, won't help. The issue is not the size of the data, it's the
> >write patterns: small random writes into the middle of existing files
> >will eventually cause those files to fragment, which causes lots of
> >seeks and short reads, which degrades performance.
> >
> >>Is this likely to give me ok-ish performance? What other
> >>possibilities are there?
> >    I would recommend benchmarking over time with your workloads, and
> >seeing how your performance degrades.
> >
> >    Hugo.
> >
> 

-- 
Hugo Mills             | emacs: Eighty Megabytes And Constantly Swapping.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-16  9:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-15  9:34 CoW with webserver databases: innodb_file_per_table and dedicated tables for blobs? Ingvar Bogdahn
2015-06-15  9:57 ` Hugo Mills
2015-06-16  7:06   ` Ingvar Bogdahn
2015-06-16  8:49     ` Fajar A. Nugraha
2015-06-16  9:32     ` Hugo Mills

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox