All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wido den Hollander <wido@42on.com>
To: Sage Weil <sage@newdream.net>,
	ceph-users@ceph.com, ceph-devel@vger.kernel.org
Subject: Re: the state of cephfs in giant
Date: Mon, 13 Oct 2014 20:20:33 +0200	[thread overview]
Message-ID: <543C17F1.7060308@42on.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1410131114130.10561@cobra.newdream.net>

On 13-10-14 20:16, Sage Weil wrote:
> We've been doing a lot of work on CephFS over the past few months. This
> is an update on the current state of things as of Giant.
> 
> What we've working on:
> 
> * better mds/cephfs health reports to the monitor
> * mds journal dump/repair tool
> * many kernel and ceph-fuse/libcephfs client bug fixes
> * file size recovery improvements
> * client session management fixes (and tests)
> * admin socket commands for diagnosis and admin intervention
> * many bug fixes
> 
> We started using CephFS to back the teuthology (QA) infrastructure in the
> lab about three months ago. We fixed a bunch of stuff over the first
> month or two (several kernel bugs, a few MDS bugs). We've had no problems
> for the last month or so. We're currently running 0.86 (giant release
> candidate) with a single MDS and ~70 OSDs. Clients are running a 3.16
> kernel plus several fixes that went into 3.17.
> 
> 
> With Giant, we are at a point where we would ask that everyone try
> things out for any non-production workloads. We are very interested in
> feedback around stability, usability, feature gaps, and performance. We
> recommend:
> 

A question to clarify this for anybody out there. Do you think it is
safe to run CephFS on a cluster which is doing production RBD/RGW I/O?

Will it be the MDS/CephFS part which breaks or are there potential issue
due to OSD classes which might cause OSDs to crash due to bugs in CephFS?

I know you can't fully rule it out, but it would be useful to have this
clarified.

> * Single active MDS. You can run any number of standby MDS's, but we are
>   not focusing on multi-mds bugs just yet (and our existing multimds test
>   suite is already hitting several).
> * No snapshots. These are disabled by default and require a scary admin
>   command to enable them. Although these mostly work, there are
>   several known issues that we haven't addressed and they complicate
>   things immensely. Please avoid them for now.
> * Either the kernel client (kernel 3.17 or later) or userspace (ceph-fuse
>   or libcephfs) clients are in good working order.
> 
> The key missing feature right now is fsck (both check and repair). This is 
> *the* development focus for Hammer.
> 
> 
> Here's a more detailed rundown of the status of various features:
> 
> * multi-mds: implemented. limited test coverage. several known issues.
>   use only for non-production workloads and expect some stability
>   issues that could lead to data loss.
> 
> * snapshots: implemented. limited test coverage. several known issues.
>   use only for non-production workloads and expect some stability issues
>   that could lead to data loss.
> 
> * hard links: stable. no known issues, but there is somewhat limited
>   test coverage (we don't test creating huge link farms).
> 
> * direct io: implemented and tested for kernel client. no special
>   support for ceph-fuse (the kernel fuse driver handles this).
> 
> * xattrs: implemented, stable, tested. no known issues (for both kernel
>   and userspace clients).
> 
> * ACLs: implemented, tested for kernel client. not implemented for
>   ceph-fuse.
> 
> * file locking (fcntl, flock): supported and tested for kernel client.
>   limited test coverage. one known minor issue for kernel with fix
>   pending. implemention in progress for ceph-fuse/libcephfs.
> 
> * kernel fscache support: implmented. no test coverage. used in
>   production by adfin.
> 
> * hadoop bindings: implemented, limited test coverage. a few known
>   issues.
> 
> * samba VFS integration: implemented, limited test coverage.
> 
> * ganesha NFS integration: implemented, no test coverage.
> 
> * kernel NFS reexport: implemented. limited test coverage. no known
>   issues.
> 
> 
> Anybody who has experienced bugs in the past should be excited by:
> 
> * new MDS admin socket commands to look at pending operations and client 
>   session states. (Check them out with "ceph daemon mds.a help"!) These 
>   will make diagnosing, debugging, and even fixing issues a lot simpler.
> 
> * the cephfs_journal_tool, which is capable of manipulating mds journal 
>   state without doing difficult exports/imports and using hexedit.
> 
> Thanks!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on

  reply	other threads:[~2014-10-13 18:20 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-13 18:16 the state of cephfs in giant Sage Weil
2014-10-13 18:20 ` Wido den Hollander [this message]
2014-10-13 18:26   ` Sage Weil
2014-10-13 19:03 ` [ceph-users] " Eric Eastman
2014-10-13 20:56   ` Sage Weil
2014-10-14  7:31 ` Amon Ott
2014-10-14 13:09   ` Sage Weil
2014-10-14 14:23   ` [ceph-users] " Sage Weil
2014-10-15  0:16     ` Alphe Salas
     [not found]       ` <543DBCE9.2080605-g2h0fw6BmCNmR6Xm/wNWPw@public.gmane.org>
2014-10-15  2:06         ` Sage Weil
     [not found]     ` <alpine.DEB.2.00.1410140718050.10462-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2014-10-15  6:43       ` Amon Ott
2014-10-15 12:11         ` [ceph-users] " Ric Wheeler
     [not found]           ` <543E645E.4080405-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-10-15 13:13             ` Amon Ott
2014-10-15 14:58               ` [ceph-users] " Sage Weil
     [not found]                 ` <alpine.DEB.2.00.1410150754560.10462-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2014-10-15 16:47                   ` Alphe Salas
     [not found] ` <alpine.DEB.2.00.1410131114130.10561-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2014-10-14  9:57   ` Thomas Lemarchand
2014-10-14 13:11     ` [ceph-users] " Sage Weil
2014-10-30 10:55   ` Florian Haas
2014-10-30 14:36     ` [ceph-users] " John Spray
     [not found]     ` <CAPUexz_+jD7RMNSZEgy3h6WqKS4PSMj1fbyRgLKxQWHvctviNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-30 15:28       ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=543C17F1.7060308@42on.com \
    --to=wido@42on.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@ceph.com \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.