All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lionel Bouton <lionel-subscription-WTamNBQcZIx7tPAFqOLdPg@public.gmane.org>
To: Mark Nelson <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org>,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-Qp0mS5GaXlQ@public.gmane.org,
	ceph-maintainers-Qp0mS5GaXlQ@public.gmane.org,
	ceph-announce-Qp0mS5GaXlQ@public.gmane.org
Subject: Re: Deprecating ext4 support
Date: Tue, 12 Apr 2016 01:09:44 +0200	[thread overview]
Message-ID: <570C2EB8.2060207@bouton.name> (raw)
In-Reply-To: <570C1DBC.3040408-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hi,

Le 11/04/2016 23:57, Mark Nelson a écrit :
> [...]
> To add to this on the performance side, we stopped doing regular
> performance testing on ext4 (and btrfs) sometime back around when ICE
> was released to focus specifically on filestore behavior on xfs. 
> There were some cases at the time where ext4 was faster than xfs, but
> not consistently so.  btrfs is often quite fast on fresh fs, but
> degrades quickly due to fragmentation induced by cow with
> small-writes-to-large-object workloads (IE RBD small writes).  If
> btrfs auto-defrag is now safe to use in production it might be worth
> looking at again, but probably not ext4.

For BTRFS, autodefrag is probably not performance-safe (yet), at least
with RBD access patterns. At least it wasn't in 4.1.9 when we tested it
last time (the performance degraded slowly but surely over several weeks
from an initially good performing filesystem to the point where we
measured a 100% increase in average latencies and large spikes and
stopped the experiment). I didn't see any patches on linux-btrfs since
then (it might have benefited from other modifications, but the
autodefrag algorithm wasn't reworked itself AFAIK).
That's not an inherent problem of BTRFS but of the autodefrag
implementation though. Deactivating autodefrag and reimplementing a
basic, cautious defragmentation scheduler gave us noticeably better
latencies with BTRFS vs XFS (~30% better) on the same hardware and
workload long term (as in almost a year and countless full-disk rewrites
on the same filesystems due to both normal writes and rebalancing with 3
to 4 months of XFS and BTRFS OSDs coexisting for comparison purposes).
I'll certainly remount a subset of our OSDs autodefrag as I did with
4.1.9 when we will deploy 4.4.x or a later LTS kernel. So I might have
more up to date information in the coming months. I don't plan to
compare BTRFS to XFS anymore though : XFS only saves us from running our
defragmentation scheduler, BTRFS is far more suited to our workload and
we've seen constant improvements in behavior in the (arguably bumpy
until late 3.19 versions) 3.16.x to 4.1.x road.

Other things:

* If the journal is not on a separate partition (SSD), it should
definitely be re-created NoCoW to avoid unnecessary fragmentation. From
memory : stop OSD, touch journal.new, chattr +C journal.new, dd
if=journal of=journal.new (your dd options here for best perf/least
amount of cache eviction), rm journal, mv journal.new journal, start OSD
again.
* filestore btrfs snap = false
  is mandatory if you want consistent performance (at least on HDDs). It
may not be felt with almost empty OSDs but performance hiccups appear if
any non trivial amount of data is added to the filesystems.
  IIRC, after debugging surprisingly the snapshot creation didn't seem
to be the actual cause of the performance problems but the snapshot
deletion... It's so bad that the default should probably be false and
not true.

Lionel

  parent reply	other threads:[~2016-04-11 23:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-11 21:39 Deprecating ext4 support Sage Weil
2016-04-11 21:44 ` Sage Weil
2016-04-11 21:57   ` Mark Nelson
     [not found]     ` <570C1DBC.3040408-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-11 22:49       ` Shinobu Kinjo
2016-04-11 23:54         ` [ceph-users] " Robin H. Johnson
2016-04-11 23:09       ` Lionel Bouton [this message]
2016-04-12  7:45   ` Jan Schermer
2016-04-12 18:00     ` Sage Weil
2016-04-12 19:19       ` Jan Schermer
2016-04-12 19:58         ` Sage Weil
2016-04-12 20:33           ` Jan Schermer
2016-04-12 20:47             ` Sage Weil
     [not found]               ` <alpine.DEB.2.11.1604121639590.29593-Wo5lQnKln9t9PHm/lf2LFUEOCMrvLtNR@public.gmane.org>
2016-04-12 21:08                 ` Nick Fisk
     [not found]                   ` <4f0f087c.9Ro.9Gf.hg.1qX1VyMOaD-ImYt9qTNe79BDgjK7y7TUQ@public.gmane.org>
2016-04-12 21:22                     ` wido-fspyXLx8qC4
     [not found]               ` <362740c3.9Ro.9Gf.hg.1qX1VyMOaE@mailjet.com>
2016-04-12 23:12                 ` [ceph-users] " Jan Schermer
2016-04-13 13:13                   ` Sage Weil
2016-04-13 13:06               ` Sage Weil
2016-04-14 18:05                 ` Jianjian Huo
2016-04-14 18:30                   ` Samuel Just
2016-04-12  6:39 ` [Ceph-maintainers] " Loic Dachary
     [not found] ` <alpine.DEB.2.11.1604111632520.13448-Wo5lQnKln9t9PHm/lf2LFUEOCMrvLtNR@public.gmane.org>
2016-04-11 21:42   ` Allen Samuels
2016-04-11 21:47     ` [ceph-users] " Jan Schermer
2016-04-11 23:39   ` Christian Balzer
2016-04-12  1:12     ` [ceph-users] " Sage Weil
     [not found]       ` <alpine.DEB.2.11.1604112046570.29593-Wo5lQnKln9t9PHm/lf2LFUEOCMrvLtNR@public.gmane.org>
2016-04-12  1:32         ` Shinobu Kinjo
2016-04-12  2:05         ` [Ceph-maintainers] " hp cre
2016-04-12  2:43       ` [ceph-users] " Christian Balzer
2016-04-12 13:56         ` Sage Weil
     [not found]           ` <alpine.DEB.2.11.1604120837120.29593-Wo5lQnKln9t9PHm/lf2LFUEOCMrvLtNR@public.gmane.org>
2016-04-13  3:27             ` Christian Balzer
     [not found]     ` <20160412083925.5106311d-9yhXNL7Kh0lSCLKNlHTxZM8NsWr+9BEh@public.gmane.org>
2016-04-14  9:43       ` Antw: " Steffen Weißgerber
2016-04-12  7:00   ` Michael Metz-Martini | SpeedPartner GmbH
2016-04-13  2:29     ` [ceph-users] " Christian Balzer
2016-04-13 12:30       ` Sage Weil
2016-04-14  0:57         ` Christian Balzer
2016-04-13 12:51       ` Michael Metz-Martini | SpeedPartner GmbH
2016-04-13 14:19 ` Francois Lafont

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570C2EB8.2060207@bouton.name \
    --to=lionel-subscription-wtamnbqczix7tpafqoldpg@public.gmane.org \
    --cc=ceph-announce-Qp0mS5GaXlQ@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ceph-maintainers-Qp0mS5GaXlQ@public.gmane.org \
    --cc=ceph-users-Qp0mS5GaXlQ@public.gmane.org \
    --cc=mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.