linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/19] ceph: Ceph distributed file system client v0.11
@ 2009-07-22 19:51 Sage Weil
  2009-07-22 19:51 ` [PATCH 01/19] ceph: documentation Sage Weil
  0 siblings, 1 reply; 39+ messages in thread
From: Sage Weil @ 2009-07-22 19:51 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: Sage Weil

This is v0.11 of the Ceph distributed file system client.  This set
addresses issues brought up last week, and adds/cleans up a lot of the
inline comments.  Thanks to Andi Kleen, Chris Wright, J Bruce Fields,
and Trond Myklebust for their feedback last time around.

Changes since v0.10:
 - killed max file size #define, now server-specified
 - simplified debug macro (use pr_debug) 
 - added a few missing '__attribute__ ((packed))'
 - kcalloc throughout
 - simplified export.c, now with useful comments
 - cleaned up mount code
 - kmem_cache for ceph_dentry_info, ceph_file_info
 - EBADF on bad caps (failed or partial reconnect to unresponsive server)
 - fixed a stray unaligned access
 - respond to control-c on slow/hung mount
 - some message encoding improvements to streamline future revisions
 - many more comments, some code cleanup

As before, my main question is: what would people like to see for this
to be merged into fs/?

Thanks-
sage

---
 Documentation/filesystems/ceph.txt |  140 ++
 fs/Kconfig                         |    1 +
 fs/Makefile                        |    1 +
 fs/ceph/Kconfig                    |   26 +
 fs/ceph/Makefile                   |   35 +
 fs/ceph/addr.c                     | 1092 ++++++++++++++
 fs/ceph/caps.c                     | 2642 +++++++++++++++++++++++++++++++++
 fs/ceph/ceph_debug.h               |   34 +
 fs/ceph/ceph_fs.h                  |  918 ++++++++++++
 fs/ceph/ceph_ver.h                 |    6 +
 fs/ceph/crush/crush.c              |  140 ++
 fs/ceph/crush/crush.h              |  188 +++
 fs/ceph/crush/hash.h               |   90 ++
 fs/ceph/crush/mapper.c             |  606 ++++++++
 fs/ceph/crush/mapper.h             |   20 +
 fs/ceph/debugfs.c                  |  462 ++++++
 fs/ceph/decode.h                   |  136 ++
 fs/ceph/dir.c                      | 1173 +++++++++++++++
 fs/ceph/export.c                   |  222 +++
 fs/ceph/file.c                     |  814 +++++++++++
 fs/ceph/inode.c                    | 2376 ++++++++++++++++++++++++++++++
 fs/ceph/ioctl.c                    |   64 +
 fs/ceph/ioctl.h                    |   12 +
 fs/ceph/mds_client.c               | 2833 ++++++++++++++++++++++++++++++++++++
 fs/ceph/mds_client.h               |  325 +++++
 fs/ceph/mdsmap.c                   |  139 ++
 fs/ceph/mdsmap.h                   |   47 +
 fs/ceph/messenger.c                | 2367 ++++++++++++++++++++++++++++++
 fs/ceph/messenger.h                |  253 ++++
 fs/ceph/mon_client.c               |  478 ++++++
 fs/ceph/mon_client.h               |  103 ++
 fs/ceph/msgr.h                     |  156 ++
 fs/ceph/osd_client.c               | 1008 +++++++++++++
 fs/ceph/osd_client.h               |  125 ++
 fs/ceph/osdmap.c                   |  697 +++++++++
 fs/ceph/osdmap.h                   |   83 ++
 fs/ceph/rados.h                    |  419 ++++++
 fs/ceph/snap.c                     |  887 +++++++++++
 fs/ceph/super.c                    | 1162 +++++++++++++++
 fs/ceph/super.h                    |  955 ++++++++++++
 fs/ceph/types.h                    |   27 +
 41 files changed, 23262 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread
* [PATCH 00/19] ceph distributed file system client
@ 2009-08-05 22:30 Sage Weil
  2009-08-05 22:30 ` [PATCH 01/19] ceph: documentation Sage Weil
  0 siblings, 1 reply; 39+ messages in thread
From: Sage Weil @ 2009-08-05 22:30 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: Sage Weil

Hi,

This is v0.12 of the Ceph distributed file system client.  Changes since
v0.11 include:

 - mapping_set_error on failed writepage
 - document correct debugfs mount point
 - simplified layout/striping ioctls
 - removed bad kmalloc in writepages
 - use mempools for writeback allocations where appropriate (*)
 - fixed a problem with capability, snap metadata writeback
 - cleaned up f(data)sync wrt metadata writeback

(*) There are still some OOM possibilities on writeback in the
messenger library.  It's possible to avoid this with the careful use
of mempools, but I'd like to avoid doing so until it's clear the
protocol isn't going to change further.

The client seems to be quite stable in a single mds, no snapshot
scenario (including recovery from mds, osd restarts).  Thorough
testing of snapshots and multiple MDSs is coming next.  Client
authentication (beyond the current host ip checking) is the other main
item on the client todo list.

As always, I'm very interested in hearing what people would like to
see for this to be merged.

Thanks,
sage


Kernel client git tree:
        git://ceph.newdream.net/linux-ceph-client.git

System:
	git://ceph.newdream.net/ceph.git


---
 Documentation/filesystems/ceph.txt |  140 ++
 fs/Kconfig                         |    1 +
 fs/Makefile                        |    1 +
 fs/ceph/Kconfig                    |   26 +
 fs/ceph/Makefile                   |   35 +
 fs/ceph/addr.c                     | 1104 ++++++++++++++
 fs/ceph/caps.c                     | 2785 +++++++++++++++++++++++++++++++++++
 fs/ceph/ceph_debug.h               |   34 +
 fs/ceph/ceph_fs.h                  |  914 ++++++++++++
 fs/ceph/ceph_ver.h                 |    6 +
 fs/ceph/crush/crush.c              |  140 ++
 fs/ceph/crush/crush.h              |  188 +++
 fs/ceph/crush/hash.h               |   90 ++
 fs/ceph/crush/mapper.c             |  606 ++++++++
 fs/ceph/crush/mapper.h             |   20 +
 fs/ceph/debugfs.c                  |  461 ++++++
 fs/ceph/decode.h                   |  136 ++
 fs/ceph/dir.c                      | 1173 +++++++++++++++
 fs/ceph/export.c                   |  222 +++
 fs/ceph/file.c                     |  750 ++++++++++
 fs/ceph/inode.c                    | 2378 ++++++++++++++++++++++++++++++
 fs/ceph/ioctl.c                    |   98 ++
 fs/ceph/ioctl.h                    |   20 +
 fs/ceph/mds_client.c               | 2865 ++++++++++++++++++++++++++++++++++++
 fs/ceph/mds_client.h               |  326 ++++
 fs/ceph/mdsmap.c                   |  139 ++
 fs/ceph/mdsmap.h                   |   47 +
 fs/ceph/messenger.c                | 2370 +++++++++++++++++++++++++++++
 fs/ceph/messenger.h                |  253 ++++
 fs/ceph/mon_client.c               |  478 ++++++
 fs/ceph/mon_client.h               |  103 ++
 fs/ceph/msgr.h                     |  156 ++
 fs/ceph/osd_client.c               | 1096 ++++++++++++++
 fs/ceph/osd_client.h               |  136 ++
 fs/ceph/osdmap.c                   |  697 +++++++++
 fs/ceph/osdmap.h                   |   83 ++
 fs/ceph/rados.h                    |  419 ++++++
 fs/ceph/snap.c                     |  896 +++++++++++
 fs/ceph/super.c                    | 1173 +++++++++++++++
 fs/ceph/super.h                    |  963 ++++++++++++
 fs/ceph/types.h                    |   27 +
 41 files changed, 23555 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread
* [PATCH 00/19] ceph: Ceph distributed file system client
@ 2008-11-14  0:55 Sage Weil
  2008-11-14  0:56 ` [PATCH 01/19] ceph: documentation Sage Weil
  0 siblings, 1 reply; 39+ messages in thread
From: Sage Weil @ 2008-11-14  0:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Sage Weil

This is a patch series for the client portion of the Ceph distributed 
file system (against v2.6.28-rc2).  Ceph releases have been announced 
here in the past, but no code has yet been posted for review.  My hope 
is for the client to eventually make it to mainline, but I have held off 
on posting anything until now because the network protocols and disk 
format are still changing, and the overall system is not ready for real 
usage (beyond testing and benchmarking).  However, the client itself is 
relatively complete and stable, and the earlier this is seen the better, 
so at Andrew's suggestion I'm sending this out now.  Please let me know 
what you think!

There are a few caveats attached to this patch series:

 * This is the client only.  The corresponding user space daemons need to
   be built in order to test it.  Instructions for getting a test setup
   running on a single node are at
        http://ceph.newdream.net/wiki/Small_test_cluster

 * There is some #ifdef kernel version compatibility cruft that will
   obviously be removed down the line.

 * Some of the IO error paths need a bit of work.  (Should pages be
   left dirty after a write error?  Should we try the write again?
   Etc.)

Any review or comments are appreciated.

Thanks-
sage


---

Ceph is a distributed file system designed for reliability,
scalability, and performance.  The storage system consists of some
(potentially large) number of storage servers (bricks, OSDs), a
smaller set of metadata server daemons, and a few monitor daemons for
managing cluster membership and state.  The storage daemons rely on
btrfs for storing data (and take advantage of btrfs' internal
transactions to keep the local data set in a consistent state).  This
makes the storage cluster simple to deploy, while providing
scalability not typically available from block-based cluster file
systems.

Additionaly, Ceph brings a few new things to Linux.  Directory 
granularity snapshots allow users to create a read-only snapshot of any 
directory (and its nested contents) with 'mkdir .snap/my_snapshot' [1]. 
Deletion is similarly trivial ('rmdir .snap/old_snapshot').  Ceph also 
maintains recursive accounting statistics on the number of nested files, 
directories, and file sizes for each directory, making it much easier 
for an administrator to manage usage [2].

Basic features include:

 * Strong data and metadata consistency between clients
 * High availability and reliability.  No single points of failure.
 * N-way replication of all data across storage nodes
 * Scalability from 1 to potentially many thousands of nodes
 * Fast recovery from node failures
 * Automatic rebalancing of data on node addition/removal
 * Easy deployment: most FS components are userspace daemons

In contrast to cluster filesystems like GFS2 and OCFS2 that rely on
symmetric access by all clients to shared block devices, Ceph
separates data and metadata management into independent server
clusters, similar to Lustre.  Unlike Lustre, however, metadata and
object storage services run entirely as user space daemons.  The
storage daemon utilizes btrfs to store data objects, leveraging its
advanced features (transactions, checksumming, metadata replication,
etc.).  File data is striped across storage nodes in large chunks to
distribute workload and facilitate high throughputs. When storage
nodes fail, data is re-replicated in a distributed fashion by the
storage nodes themselves (with some minimal coordination from the
cluster monitor), making the system extremely efficient and scalable.

Metadata servers effectively form a large, consistent, distributed 
in-memory cache above the storage cluster that is scalable, dynamically 
redistributes metadata in response to workload changes, and can tolerate 
arbitrary (well, non-Byzantine) node failures.  The metadata server 
takes a somewhat unconventional approach to metadata storage to 
significantly improve performance for common workloads.  In particular, 
inodes with only a single link are embedded in directories, allowing 
entire directories of dentries and inodes to be loaded into its cache 
with a single I/O operation.  The contents of large directories can be 
fragmented and managed by independent metadata servers, allowing 
scalable concurrent access.

The system offers automatic data rebalancing/migration when scaling from 
a small cluster of just a few nodes to many hundreds, without requiring 
an administrator to carve the data set into static volumes or go through 
the tedious process of migrating data between servers.  When the file 
system approaches full, new storage nodes can be easily added and things 
will "just work."


A git tree containing just the client (and this patch series) is at
        git://ceph.newdream.net/linux-ceph-client.git

The source for the full system is at
        git://ceph.newdream.net/ceph.git

The Ceph home page is at
        http://ceph.newdream.net


[1] Snapshots
        http://marc.info/?l=linux-fsdevel&m=122341525709480&w=2
[2] Recursive accounting
        http://marc.info/?l=linux-fsdevel&m=121614651204667&w=2

---
 Documentation/filesystems/ceph.txt |  173 +++
 fs/Kconfig                         |   20 +
 fs/Makefile                        |    1 +
 fs/ceph/Makefile                   |   35 +
 fs/ceph/addr.c                     | 1010 ++++++++++++++++
 fs/ceph/caps.c                     | 1464 +++++++++++++++++++++++
 fs/ceph/ceph_debug.h               |  130 ++
 fs/ceph/ceph_fs.h                  | 1225 +++++++++++++++++++
 fs/ceph/ceph_tools.c               |  125 ++
 fs/ceph/ceph_tools.h               |   19 +
 fs/ceph/crush/crush.c              |  139 +++
 fs/ceph/crush/crush.h              |  176 +++
 fs/ceph/crush/hash.h               |   80 ++
 fs/ceph/crush/mapper.c             |  507 ++++++++
 fs/ceph/crush/mapper.h             |   19 +
 fs/ceph/decode.h                   |  151 +++
 fs/ceph/dir.c                      |  891 ++++++++++++++
 fs/ceph/export.c                   |  145 +++
 fs/ceph/file.c                     |  446 +++++++
 fs/ceph/inode.c                    | 2070 ++++++++++++++++++++++++++++++++
 fs/ceph/ioctl.c                    |   72 ++
 fs/ceph/ioctl.h                    |   12 +
 fs/ceph/mds_client.c               | 2261 +++++++++++++++++++++++++++++++++++
 fs/ceph/mds_client.h               |  255 ++++
 fs/ceph/mdsmap.c                   |  123 ++
 fs/ceph/mdsmap.h                   |   41 +
 fs/ceph/messenger.c                | 2304 ++++++++++++++++++++++++++++++++++++
 fs/ceph/messenger.h                |  269 +++++
 fs/ceph/mon_client.c               |  385 ++++++
 fs/ceph/mon_client.h               |  100 ++
 fs/ceph/osd_client.c               | 1125 ++++++++++++++++++
 fs/ceph/osd_client.h               |  135 +++
 fs/ceph/osdmap.c                   |  664 +++++++++++
 fs/ceph/osdmap.h                   |   82 ++
 fs/ceph/proc.c                     |  186 +++
 fs/ceph/snap.c                     |  753 ++++++++++++
 fs/ceph/super.c                    | 1165 ++++++++++++++++++
 fs/ceph/super.h                    |  687 +++++++++++
 fs/ceph/types.h                    |   20 +
 39 files changed, 19465 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-08-05 22:30 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-22 19:51 [PATCH 00/19] ceph: Ceph distributed file system client v0.11 Sage Weil
2009-07-22 19:51 ` [PATCH 01/19] ceph: documentation Sage Weil
2009-07-22 19:51   ` [PATCH 02/19] ceph: on-wire types Sage Weil
2009-07-22 19:51     ` [PATCH 03/19] ceph: client types Sage Weil
2009-07-22 19:51       ` [PATCH 04/19] ceph: super.c Sage Weil
2009-07-22 19:51         ` [PATCH 05/19] ceph: inode operations Sage Weil
2009-07-22 19:51           ` [PATCH 06/19] ceph: directory operations Sage Weil
2009-07-22 19:51             ` [PATCH 07/19] ceph: file operations Sage Weil
2009-07-22 19:51               ` [PATCH 08/19] ceph: address space operations Sage Weil
2009-07-22 19:51                 ` [PATCH 09/19] ceph: MDS client Sage Weil
2009-07-22 19:51                   ` [PATCH 10/19] ceph: OSD client Sage Weil
2009-07-22 19:51                     ` [PATCH 11/19] ceph: CRUSH mapping algorithm Sage Weil
2009-07-22 19:51                       ` [PATCH 12/19] ceph: monitor client Sage Weil
2009-07-22 19:51                         ` [PATCH 13/19] ceph: capability management Sage Weil
2009-07-22 19:51                           ` [PATCH 14/19] ceph: snapshot management Sage Weil
2009-07-22 19:51                             ` [PATCH 15/19] ceph: messenger library Sage Weil
2009-07-22 19:51                               ` [PATCH 16/19] ceph: nfs re-export support Sage Weil
2009-07-22 19:51                                 ` [PATCH 17/19] ceph: ioctls Sage Weil
2009-07-22 19:51                                   ` [PATCH 18/19] ceph: debugfs Sage Weil
2009-07-22 19:51                                     ` [PATCH 19/19] ceph: Kconfig, Makefile Sage Weil
2009-07-25  5:31                                     ` [PATCH 18/19] ceph: debugfs Greg KH
2009-07-27 17:06                                       ` Sage Weil
2009-07-22 22:39                                   ` [PATCH 17/19] ceph: ioctls Andi Kleen
2009-07-22 23:52                                     ` Sage Weil
2009-07-23  6:24                                       ` Andi Kleen
2009-07-23 18:42                                         ` Sage Weil
2009-07-23 10:25                 ` [PATCH 08/19] ceph: address space operations Andi Kleen
2009-07-23 18:22                   ` Sage Weil
2009-07-23 19:16                     ` Andi Kleen
2009-07-24  4:48                       ` Sage Weil
2009-07-23 19:17                     ` Andi Kleen
2009-07-23 18:26                   ` Sage Weil
2009-07-23 18:47                     ` Trond Myklebust
2009-07-24  4:44                       ` Sage Weil
2009-07-24  6:56                         ` Andi Kleen
2009-07-24 16:52                           ` Sage Weil
2009-07-24 19:40                         ` J. Bruce Fields
  -- strict thread matches above, loose matches on Subject: below --
2009-08-05 22:30 [PATCH 00/19] ceph distributed file system client Sage Weil
2009-08-05 22:30 ` [PATCH 01/19] ceph: documentation Sage Weil
2008-11-14  0:55 [PATCH 00/19] ceph: Ceph distributed file system client Sage Weil
2008-11-14  0:56 ` [PATCH 01/19] ceph: documentation Sage Weil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).