[GIT PULL] Ceph distributed file system client for 2.6.33

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL] Ceph distributed file system client for 2.6.33
@ 2009-12-07 23:25 Sage Weil
  2009-12-18 20:54 ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2009-12-07 23:25 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, linux-fsdevel

Hi Linus,

Please pull from 'master' branch of

	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git master

to receive the Ceph distributed file system client.  The fs has made a 
half dozen rounds on linux-fsdevel, and has been in linux-next for the 
last month or so.  Although review has been sparse, Andrew said the code 
looks reasonable for 2.6.33.

The git tree includes the full patchset posted in October and incremental 
changes since then.  I've tried to cram in all the anticipated protocol 
changes, but the file system is still strictly EXPERIMENTAL and is marked 
as such.  Merging now will attract new eyes and make it easier to test and 
evaluate the system (both the client and server side).

Basic features include:

 * High availability and reliability.  No single points of failure.
 * Strong data and metadata consistency between clients
 * N-way replication of all data across storage nodes
 * Seamless scaling from 1 to potentially many thousands of nodes
 * Fast recovery from node failures
 * Automatic rebalancing of data on node addition/removal
 * Easy deployment: most FS components are userspace daemons

More info on Ceph at

	http://ceph.newdream.net/

Thanks-
sage


Julia Lawall (2):
      fs/ceph: introduce missing kfree
      fs/ceph: Move a dereference below a NULL test

Noah Watkins (3):
      ceph: replace list_entry with container_of
      ceph: remove redundant use of le32_to_cpu
      ceph: fix intra strip unit length calculation

Sage Weil (93):
      ceph: documentation
      ceph: on-wire types
      ceph: client types
      ceph: ref counted buffer
      ceph: super.c
      ceph: inode operations
      ceph: directory operations
      ceph: file operations
      ceph: address space operations
      ceph: MDS client
      ceph: OSD client
      ceph: CRUSH mapping algorithm
      ceph: monitor client
      ceph: capability management
      ceph: snapshot management
      ceph: messenger library
      ceph: message pools
      ceph: nfs re-export support
      ceph: ioctls
      ceph: debugfs
      ceph: Kconfig, Makefile
      ceph: document shared files in README
      ceph: show meaningful version on module load
      ceph: include preferred_osd in file layout virtual xattr
      ceph: gracefully avoid empty crush buckets
      ceph: fix mdsmap decoding when multiple mds's are present
      ceph: renew mon subscription before it expires
      ceph: fix osd request submission race
      ceph: revoke osd request message on request completion
      ceph: fail gracefully on corrupt osdmap (bad pg_temp mapping)
      ceph: reset osd session on fault, not peer_reset
      ceph: cancel osd requests before resending them
      ceph: update to mon client protocol v15
      ceph: add file layout validation
      ceph: ignore trailing data in monamp
      ceph: remove unused CEPH_MSG_{OSD,MDS}_GETMAP
      ceph: add version field to message header
      ceph: convert encode/decode macros to inlines
      ceph: initialize sb->s_bdi, bdi_unregister after kill_anon_super
      ceph: move generic flushing code into helper
      ceph: flush dirty caps via the cap_dirty list
      ceph: correct subscribe_ack msgpool payload size
      ceph: warn on allocation from msgpool with larger front_len
      ceph: move dirty caps code around
      ceph: enable readahead
      ceph: include preferred osd in placement seed
      ceph: v0.17 of client
      ceph: move directory size logic to ceph_getattr
      ceph: remove small mon addr limit; use CEPH_MAX_MON where appropriate
      ceph: reduce parse_mount_args stack usage
      ceph: silence uninitialized variable warning
      ceph: fix, clean up string mount arg parsing
      ceph: allocate and parse mount args before client instance
      ceph: correct comment to match striping calculation
      ceph: fix object striping calculation for non-default striping schemes
      ceph: fix uninitialized err variable
      crush: always return a value from crush_bucket_choose
      ceph: init/destroy bdi in client create/destroy helpers
      ceph: use fixed endian encoding for ceph_entity_addr
      ceph: fix endian conversions for ceph_pg
      ceph: fix sparse endian warning
      ceph: convert port endianness
      ceph: clean up 'osd%d down' console msg
      ceph: make CRUSH hash functions non-inline
      ceph: use strong hash function for mapping objects to pgs
      ceph: make object hash a pg_pool property
      ceph: make CRUSH hash function a bucket property
      ceph: do not confuse stale and dead (unreconnected) caps
      ceph: separate banner and connect during handshake into distinct stages
      ceph: remove recon_gen logic
      ceph: exclude snapdir from readdir results
      ceph: initialize i_size/i_rbytes on snapdir
      ceph: pr_info when mds reconnect completes
      ceph: build cleanly without CONFIG_DEBUG_FS
      ceph: fix page invalidation deadlock
      ceph: remove bad calls to ceph_con_shutdown
      ceph: remove unnecessary ceph_con_shutdown
      ceph: handle errors during osd client init
      ceph: negotiate authentication protocol; implement AUTH_NONE protocol
      ceph: move mempool creation to ceph_create_client
      ceph: small cleanup in hash function
      ceph: fix debugfs entry, simplify fsid checks
      ceph: decode updated mdsmap format
      ceph: reset requested max_size after mds reconnect
      ceph: reset msgr backoff during open, not after successful handshake
      ceph: remove dead code
      ceph: remove useless IS_ERR checks
      ceph: plug leak of request_mutex
      ceph: whitespace cleanup
      ceph: hide /.ceph from readdir results
      ceph: allow preferred osd to be get/set via layout ioctl
      ceph: update MAINTAINERS entry with correct git URL
      ceph: mark v0.18 release

Yehuda Sadeh (1):
      ceph: mount fails immediately on error

----
 Documentation/filesystems/ceph.txt   |  139 ++
 Documentation/ioctl/ioctl-number.txt |    1 +
 MAINTAINERS                          |    9 +
 fs/Kconfig                           |    1 +
 fs/Makefile                          |    1 +
 fs/ceph/Kconfig                      |   26 +
 fs/ceph/Makefile                     |   37 +
 fs/ceph/README                       |   20 +
 fs/ceph/addr.c                       | 1115 +++++++++++++
 fs/ceph/auth.c                       |  225 +++
 fs/ceph/auth.h                       |   77 +
 fs/ceph/auth_none.c                  |  120 ++
 fs/ceph/auth_none.h                  |   28 +
 fs/ceph/buffer.c                     |   34 +
 fs/ceph/buffer.h                     |   55 +
 fs/ceph/caps.c                       | 2863 ++++++++++++++++++++++++++++++++
 fs/ceph/ceph_debug.h                 |   37 +
 fs/ceph/ceph_frag.c                  |   21 +
 fs/ceph/ceph_frag.h                  |  109 ++
 fs/ceph/ceph_fs.c                    |   74 +
 fs/ceph/ceph_fs.h                    |  648 ++++++++
 fs/ceph/ceph_hash.c                  |  118 ++
 fs/ceph/ceph_hash.h                  |   13 +
 fs/ceph/ceph_strings.c               |  176 ++
 fs/ceph/crush/crush.c                |  151 ++
 fs/ceph/crush/crush.h                |  180 ++
 fs/ceph/crush/hash.c                 |  149 ++
 fs/ceph/crush/hash.h                 |   17 +
 fs/ceph/crush/mapper.c               |  596 +++++++
 fs/ceph/crush/mapper.h               |   20 +
 fs/ceph/debugfs.c                    |  450 +++++
 fs/ceph/decode.h                     |  159 ++
 fs/ceph/dir.c                        | 1222 ++++++++++++++
 fs/ceph/export.c                     |  223 +++
 fs/ceph/file.c                       |  904 +++++++++++
 fs/ceph/inode.c                      | 1624 +++++++++++++++++++
 fs/ceph/ioctl.c                      |  160 ++
 fs/ceph/ioctl.h                      |   40 +
 fs/ceph/mds_client.c                 | 2976 ++++++++++++++++++++++++++++++++++
 fs/ceph/mds_client.h                 |  327 ++++
 fs/ceph/mdsmap.c                     |  170 ++
 fs/ceph/mdsmap.h                     |   54 +
 fs/ceph/messenger.c                  | 2103 ++++++++++++++++++++++++
 fs/ceph/messenger.h                  |  253 +++
 fs/ceph/mon_client.c                 |  751 +++++++++
 fs/ceph/mon_client.h                 |  115 ++
 fs/ceph/msgpool.c                    |  181 ++
 fs/ceph/msgpool.h                    |   27 +
 fs/ceph/msgr.h                       |  167 ++
 fs/ceph/osd_client.c                 | 1364 ++++++++++++++++
 fs/ceph/osd_client.h                 |  150 ++
 fs/ceph/osdmap.c                     |  916 +++++++++++
 fs/ceph/osdmap.h                     |  124 ++
 fs/ceph/rados.h                      |  370 +++++
 fs/ceph/snap.c                       |  887 ++++++++++
 fs/ceph/super.c                      |  984 +++++++++++
 fs/ceph/super.h                      |  895 ++++++++++
 fs/ceph/types.h                      |   29 +
 fs/ceph/xattr.c                      |  842 ++++++++++
 59 files changed, 25527 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-07 23:25 [GIT PULL] Ceph distributed file system client for 2.6.33 Sage Weil
@ 2009-12-18 20:54 ` Sage Weil
  2009-12-18 21:38   ` Linus Torvalds
  2009-12-19  5:33   ` Valdis.Kletnieks
  0 siblings, 2 replies; 10+ messages in thread
From: Sage Weil @ 2009-12-18 20:54 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, linux-kernel, linux-fsdevel

Hi Linus,

I would still like to see ceph merged for 2.6.33.  It's certainly not 
production ready, but it would be greatly beneficial to be in mainline for 
the same reasons other file systems like btrfs and exofs were merged 
early.

Is there more information you'd like to see from me before pulling?  If 
there was a reason you decided not to pull, please let me know.

Thanks-
sage


On Mon, 7 Dec 2009, Sage Weil wrote:

> Hi Linus,
> 
> Please pull from 'master' branch of
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git master
> 
> to receive the Ceph distributed file system client.  The fs has made a 
> half dozen rounds on linux-fsdevel, and has been in linux-next for the 
> last month or so.  Although review has been sparse, Andrew said the code 
> looks reasonable for 2.6.33.
> 
> The git tree includes the full patchset posted in October and incremental 
> changes since then.  I've tried to cram in all the anticipated protocol 
> changes, but the file system is still strictly EXPERIMENTAL and is marked 
> as such.  Merging now will attract new eyes and make it easier to test and 
> evaluate the system (both the client and server side).
> 
> Basic features include:
> 
>  * High availability and reliability.  No single points of failure.
>  * Strong data and metadata consistency between clients
>  * N-way replication of all data across storage nodes
>  * Seamless scaling from 1 to potentially many thousands of nodes
>  * Fast recovery from node failures
>  * Automatic rebalancing of data on node addition/removal
>  * Easy deployment: most FS components are userspace daemons
> 
> More info on Ceph at
> 
> 	http://ceph.newdream.net/
> 
> Thanks-
> sage
> 
> 
> Julia Lawall (2):
>       fs/ceph: introduce missing kfree
>       fs/ceph: Move a dereference below a NULL test
> 
> Noah Watkins (3):
>       ceph: replace list_entry with container_of
>       ceph: remove redundant use of le32_to_cpu
>       ceph: fix intra strip unit length calculation
> 
> Sage Weil (93):
>       ceph: documentation
>       ceph: on-wire types
>       ceph: client types
>       ceph: ref counted buffer
>       ceph: super.c
>       ceph: inode operations
>       ceph: directory operations
>       ceph: file operations
>       ceph: address space operations
>       ceph: MDS client
>       ceph: OSD client
>       ceph: CRUSH mapping algorithm
>       ceph: monitor client
>       ceph: capability management
>       ceph: snapshot management
>       ceph: messenger library
>       ceph: message pools
>       ceph: nfs re-export support
>       ceph: ioctls
>       ceph: debugfs
>       ceph: Kconfig, Makefile
>       ceph: document shared files in README
>       ceph: show meaningful version on module load
>       ceph: include preferred_osd in file layout virtual xattr
>       ceph: gracefully avoid empty crush buckets
>       ceph: fix mdsmap decoding when multiple mds's are present
>       ceph: renew mon subscription before it expires
>       ceph: fix osd request submission race
>       ceph: revoke osd request message on request completion
>       ceph: fail gracefully on corrupt osdmap (bad pg_temp mapping)
>       ceph: reset osd session on fault, not peer_reset
>       ceph: cancel osd requests before resending them
>       ceph: update to mon client protocol v15
>       ceph: add file layout validation
>       ceph: ignore trailing data in monamp
>       ceph: remove unused CEPH_MSG_{OSD,MDS}_GETMAP
>       ceph: add version field to message header
>       ceph: convert encode/decode macros to inlines
>       ceph: initialize sb->s_bdi, bdi_unregister after kill_anon_super
>       ceph: move generic flushing code into helper
>       ceph: flush dirty caps via the cap_dirty list
>       ceph: correct subscribe_ack msgpool payload size
>       ceph: warn on allocation from msgpool with larger front_len
>       ceph: move dirty caps code around
>       ceph: enable readahead
>       ceph: include preferred osd in placement seed
>       ceph: v0.17 of client
>       ceph: move directory size logic to ceph_getattr
>       ceph: remove small mon addr limit; use CEPH_MAX_MON where appropriate
>       ceph: reduce parse_mount_args stack usage
>       ceph: silence uninitialized variable warning
>       ceph: fix, clean up string mount arg parsing
>       ceph: allocate and parse mount args before client instance
>       ceph: correct comment to match striping calculation
>       ceph: fix object striping calculation for non-default striping schemes
>       ceph: fix uninitialized err variable
>       crush: always return a value from crush_bucket_choose
>       ceph: init/destroy bdi in client create/destroy helpers
>       ceph: use fixed endian encoding for ceph_entity_addr
>       ceph: fix endian conversions for ceph_pg
>       ceph: fix sparse endian warning
>       ceph: convert port endianness
>       ceph: clean up 'osd%d down' console msg
>       ceph: make CRUSH hash functions non-inline
>       ceph: use strong hash function for mapping objects to pgs
>       ceph: make object hash a pg_pool property
>       ceph: make CRUSH hash function a bucket property
>       ceph: do not confuse stale and dead (unreconnected) caps
>       ceph: separate banner and connect during handshake into distinct stages
>       ceph: remove recon_gen logic
>       ceph: exclude snapdir from readdir results
>       ceph: initialize i_size/i_rbytes on snapdir
>       ceph: pr_info when mds reconnect completes
>       ceph: build cleanly without CONFIG_DEBUG_FS
>       ceph: fix page invalidation deadlock
>       ceph: remove bad calls to ceph_con_shutdown
>       ceph: remove unnecessary ceph_con_shutdown
>       ceph: handle errors during osd client init
>       ceph: negotiate authentication protocol; implement AUTH_NONE protocol
>       ceph: move mempool creation to ceph_create_client
>       ceph: small cleanup in hash function
>       ceph: fix debugfs entry, simplify fsid checks
>       ceph: decode updated mdsmap format
>       ceph: reset requested max_size after mds reconnect
>       ceph: reset msgr backoff during open, not after successful handshake
>       ceph: remove dead code
>       ceph: remove useless IS_ERR checks
>       ceph: plug leak of request_mutex
>       ceph: whitespace cleanup
>       ceph: hide /.ceph from readdir results
>       ceph: allow preferred osd to be get/set via layout ioctl
>       ceph: update MAINTAINERS entry with correct git URL
>       ceph: mark v0.18 release
> 
> Yehuda Sadeh (1):
>       ceph: mount fails immediately on error
> 
> ----
>  Documentation/filesystems/ceph.txt   |  139 ++
>  Documentation/ioctl/ioctl-number.txt |    1 +
>  MAINTAINERS                          |    9 +
>  fs/Kconfig                           |    1 +
>  fs/Makefile                          |    1 +
>  fs/ceph/Kconfig                      |   26 +
>  fs/ceph/Makefile                     |   37 +
>  fs/ceph/README                       |   20 +
>  fs/ceph/addr.c                       | 1115 +++++++++++++
>  fs/ceph/auth.c                       |  225 +++
>  fs/ceph/auth.h                       |   77 +
>  fs/ceph/auth_none.c                  |  120 ++
>  fs/ceph/auth_none.h                  |   28 +
>  fs/ceph/buffer.c                     |   34 +
>  fs/ceph/buffer.h                     |   55 +
>  fs/ceph/caps.c                       | 2863 ++++++++++++++++++++++++++++++++
>  fs/ceph/ceph_debug.h                 |   37 +
>  fs/ceph/ceph_frag.c                  |   21 +
>  fs/ceph/ceph_frag.h                  |  109 ++
>  fs/ceph/ceph_fs.c                    |   74 +
>  fs/ceph/ceph_fs.h                    |  648 ++++++++
>  fs/ceph/ceph_hash.c                  |  118 ++
>  fs/ceph/ceph_hash.h                  |   13 +
>  fs/ceph/ceph_strings.c               |  176 ++
>  fs/ceph/crush/crush.c                |  151 ++
>  fs/ceph/crush/crush.h                |  180 ++
>  fs/ceph/crush/hash.c                 |  149 ++
>  fs/ceph/crush/hash.h                 |   17 +
>  fs/ceph/crush/mapper.c               |  596 +++++++
>  fs/ceph/crush/mapper.h               |   20 +
>  fs/ceph/debugfs.c                    |  450 +++++
>  fs/ceph/decode.h                     |  159 ++
>  fs/ceph/dir.c                        | 1222 ++++++++++++++
>  fs/ceph/export.c                     |  223 +++
>  fs/ceph/file.c                       |  904 +++++++++++
>  fs/ceph/inode.c                      | 1624 +++++++++++++++++++
>  fs/ceph/ioctl.c                      |  160 ++
>  fs/ceph/ioctl.h                      |   40 +
>  fs/ceph/mds_client.c                 | 2976 ++++++++++++++++++++++++++++++++++
>  fs/ceph/mds_client.h                 |  327 ++++
>  fs/ceph/mdsmap.c                     |  170 ++
>  fs/ceph/mdsmap.h                     |   54 +
>  fs/ceph/messenger.c                  | 2103 ++++++++++++++++++++++++
>  fs/ceph/messenger.h                  |  253 +++
>  fs/ceph/mon_client.c                 |  751 +++++++++
>  fs/ceph/mon_client.h                 |  115 ++
>  fs/ceph/msgpool.c                    |  181 ++
>  fs/ceph/msgpool.h                    |   27 +
>  fs/ceph/msgr.h                       |  167 ++
>  fs/ceph/osd_client.c                 | 1364 ++++++++++++++++
>  fs/ceph/osd_client.h                 |  150 ++
>  fs/ceph/osdmap.c                     |  916 +++++++++++
>  fs/ceph/osdmap.h                     |  124 ++
>  fs/ceph/rados.h                      |  370 +++++
>  fs/ceph/snap.c                       |  887 ++++++++++
>  fs/ceph/super.c                      |  984 +++++++++++
>  fs/ceph/super.h                      |  895 ++++++++++
>  fs/ceph/types.h                      |   29 +
>  fs/ceph/xattr.c                      |  842 ++++++++++
>  59 files changed, 25527 insertions(+), 0 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-18 20:54 ` Sage Weil
@ 2009-12-18 21:38   ` Linus Torvalds
  2009-12-18 23:15     ` Jim Garlick
  2010-02-09 20:43     ` Josef Bacik
  2009-12-19  5:33   ` Valdis.Kletnieks
  1 sibling, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2009-12-18 21:38 UTC (permalink / raw)
  To: Sage Weil, Gregory Haskins
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-fsdevel

On Fri, 18 Dec 2009, Sage Weil wrote:
> 
> I would still like to see ceph merged for 2.6.33.  It's certainly not 
> production ready, but it would be greatly beneficial to be in mainline for 
> the same reasons other file systems like btrfs and exofs were merged 
> early.

So what happened to ceph is the same thing that happened to the alacrityvm 
pull request (Greg Haskins added to cc): I pretty much continually had a 
_lot_ of pull requests, and all the time the priority for the ceph and 
alactrityvm pull requests were just low enough on my priority list that I 
never felt I had the reason to look into the background enough to make an 
even half-assed decision of whether to pull or not.

And no, "just pull" is not my default answer - if I don't have a reason, 
the default action is "don't pull".

I used to say that "my job is to say 'no'", although I've been so good at 
farming out submaintainers that most of the time my real job is to pull 
from submaintainers who hopefully know how to say 'no'. But when it comes 
to whole new driver features, I'm still "no by default - tell me _why_ I 
should pull".

So what is a new subsystem person to do?

The best thing to do is to try to have users that are vocal about the 
feature, and talk about how great it is. Some advocates for it, in other 
words. Just a few other people saying "hey, I use this, it's great", is 
actually a big deal to me. For alacrityvm and cephfs, I didn't have that, 
or they just weren't loud enough for me to hear.

So since you mentioned btrfs as an "early merge", I'll mention it too, as 
a great example of how something got merged early because it had easily 
gotten past my "people are asking for it" filter, to the point where _I_ 
was interested in trying it out personally, and asking Chris&co to tell me 
when it was ready.

Ok, so that was somewhat unusual - I'm not suggesting you'd need to try to 
drum up quite _that_ much hype - but it kind of illustrates the opposite 
extreme of your issue. Get some PR going, get people talking about it, get 
people testing it out. Get people outside of your area saying "hey, I use 
it, and I hate having to merge it every release".

Then, when I see a pull request during the merge window, the pull suddenly 
has a much higher priority, and I go "Ok, I know people are using this".

So no astro-turfing, but real grass-roots support really does help (or 
top-down feedback for that matter - if a _distribution_ says "we're going 
to merge this in our distro regardless", that also counts as a big hint 
for me that people actually expect to use it and would like to not go 
through the pain of merging).

			Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-18 21:38   ` Linus Torvalds
@ 2009-12-18 23:15     ` Jim Garlick
  2009-12-19 11:01       ` Andi Kleen
  2010-02-09 20:43     ` Josef Bacik
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Garlick @ 2009-12-18 23:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sage Weil, Gregory Haskins, Andrew Morton,
	Linux Kernel Mailing List, linux-fsdevel

On Fri, Dec 18, 2009 at 01:38:00PM -0800, Linus Torvalds wrote:
> On Fri, 18 Dec 2009, Sage Weil wrote:
> > 
> > I would still like to see ceph merged for 2.6.33.  It's certainly not 
> > production ready, but it would be greatly beneficial to be in mainline for 
> > the same reasons other file systems like btrfs and exofs were merged 
> > early.
> 
> The best thing to do is to try to have users that are vocal about the 
> feature, and talk about how great it is. Some advocates for it, in other 
> words. Just a few other people saying "hey, I use this, it's great", is 
> actually a big deal to me. For alacrityvm and cephfs, I didn't have that, 
> or they just weren't loud enough for me to hear.

FWIW: I'd like to see it go in.

Ceph is new and experimental so you're not going to see production shops
like ours jumping up and down saying we use it and are tired of merging it,
like we would say if if Lustre were (again) on the table.

However I will say Ceph looks good and in the interest of nuturing future
options, I'm for merging it!

Jim Garlick
Lawrence Livermore National Laboratory

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-18 23:15     ` Jim Garlick
@ 2009-12-19 11:01       ` Andi Kleen
  2009-12-21 16:42         ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2009-12-19 11:01 UTC (permalink / raw)
  To: Jim Garlick
  Cc: Linus Torvalds, Sage Weil, Gregory Haskins, Andrew Morton,
	Linux Kernel Mailing List, linux-fsdevel, greg

Jim Garlick <garlick@llnl.gov> writes:
>
> Ceph is new and experimental so you're not going to see production shops

One issue with ceph is that I'm not sure it has any users at all.
The mailing list seems to be pretty much dead?
On a philosophical area I agree that network file systems are
definitely an area that could need some more improvements.

> like ours jumping up and down saying we use it and are tired of merging it,
> like we would say if if Lustre were (again) on the table.

OT, but I took a look at some Lustre srpm a few months ago and it
didn't seem to still require all the horrible VFS patches that the
older versions were plagued with (or perhaps I missed them). Because
it definitely seems to have a large real world user base perhaps it
would be something for staging at least these days?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-19 11:01       ` Andi Kleen
@ 2009-12-21 16:42         ` Sage Weil
  0 siblings, 0 replies; 10+ messages in thread
From: Sage Weil @ 2009-12-21 16:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jim Garlick, Linus Torvalds, Gregory Haskins, Andrew Morton,
	Linux Kernel Mailing List, linux-fsdevel, greg

On Sat, 19 Dec 2009, Andi Kleen wrote:

> Jim Garlick <garlick@llnl.gov> writes:
> >
> > Ceph is new and experimental so you're not going to see production shops
> 
> One issue with ceph is that I'm not sure it has any users at all.
> The mailing list seems to be pretty much dead?
> On a philosophical area I agree that network file systems are
> definitely an area that could need some more improvements.

The list is slow.  The developers all work in the same office, so most of 
the technical discussion ends up face to face (we're working on moving 
more of it to the list).  I also tend to send users actively testing it to 
the irc channel.  

That said, there aren't many active users.  I see lots of interested 
people lurking on the list and 'waiting for stability,' but I think the 
prospect of testing an unstable cluster fs is much more daunting than a 
local one.

If you want stability, then it's probably too early to merge.  If you want 
active users, that essentially hinges on stability too.  But if it's 
interest in/demand for an alternative distributed fs, then the sooner it's 
merged the better.  

>From my point of view merging now will be a bit rockier with coordinating 
releases, bug fixes, and dealing with any unforseen client side changes, 
but I think it'll be worth it.  OTOH, another release cycle will bring 
greater stability and better first impressions.

sage

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-18 21:38   ` Linus Torvalds
  2009-12-18 23:15     ` Jim Garlick
@ 2010-02-09 20:43     ` Josef Bacik
  1 sibling, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2010-02-09 20:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sage Weil, Andrew Morton, Linux Kernel Mailing List,
	linux-fsdevel, jdarcy, rwheeler

On Fri, Dec 18, 2009 at 01:38:00PM -0800, Linus Torvalds wrote:
> 
> 
> On Fri, 18 Dec 2009, Sage Weil wrote:
> > 
> > I would still like to see ceph merged for 2.6.33.  It's certainly not 
> > production ready, but it would be greatly beneficial to be in mainline for 
> > the same reasons other file systems like btrfs and exofs were merged 
> > early.
> 
> So what happened to ceph is the same thing that happened to the alacrityvm 
> pull request (Greg Haskins added to cc): I pretty much continually had a 
> _lot_ of pull requests, and all the time the priority for the ceph and 
> alactrityvm pull requests were just low enough on my priority list that I 
> never felt I had the reason to look into the background enough to make an 
> even half-assed decision of whether to pull or not.
> 
> And no, "just pull" is not my default answer - if I don't have a reason, 
> the default action is "don't pull".
> 
> I used to say that "my job is to say 'no'", although I've been so good at 
> farming out submaintainers that most of the time my real job is to pull 
> from submaintainers who hopefully know how to say 'no'. But when it comes 
> to whole new driver features, I'm still "no by default - tell me _why_ I 
> should pull".
> 
> So what is a new subsystem person to do?
> 
> The best thing to do is to try to have users that are vocal about the 
> feature, and talk about how great it is. Some advocates for it, in other 
> words. Just a few other people saying "hey, I use this, it's great", is 
> actually a big deal to me. For alacrityvm and cephfs, I didn't have that, 
> or they just weren't loud enough for me to hear.
> 
> So since you mentioned btrfs as an "early merge", I'll mention it too, as 
> a great example of how something got merged early because it had easily 
> gotten past my "people are asking for it" filter, to the point where _I_ 
> was interested in trying it out personally, and asking Chris&co to tell me 
> when it was ready.
> 
> Ok, so that was somewhat unusual - I'm not suggesting you'd need to try to 
> drum up quite _that_ much hype - but it kind of illustrates the opposite 
> extreme of your issue. Get some PR going, get people talking about it, get 
> people testing it out. Get people outside of your area saying "hey, I use 
> it, and I hate having to merge it every release".
> 
> Then, when I see a pull request during the merge window, the pull suddenly 
> has a much higher priority, and I go "Ok, I know people are using this".
> 
> So no astro-turfing, but real grass-roots support really does help (or 
> top-down feedback for that matter - if a _distribution_ says "we're going 
> to merge this in our distro regardless", that also counts as a big hint 
> for me that people actually expect to use it and would like to not go 
> through the pain of merging).
> 

We have had bugzilla's opened with us (Red Hat) requesting that CEPH be included
in Fedora/RHEL, so I'm here to yell loudly that somebody wants it :).

The problem for these particular users is that sucking down a git tree and
applying patches and building a kernel is a very high entry cost to test
something they are very excited about, so they depend on distributions to ship
the new fun stuff for them to start testing.  The problem is the distributions
do not want to ship new fun stuff thats not upstream if at all possible
(especially when it comes to filesystems).  I personally have no issues with
just sucking a bunch of patches into the Fedora kernel so people can start
testing it, but I think that sends the wrong message, since we're supposed to be
following upstream and encouraging people to push their code upstream first.
Not to mention that it makes the actual Fedora kernel team antsy, and I already
bug them enough with what I pull in for btrfs :).

So for the time being I'm just going to pull the userspace stuff into Fedora.
If you still feel that there is not enough users to justify pulling CEPH in I
will probably pull the patches into the rawhide Fedora kernel when F13 branches
off and hopefully that will pull even more users in.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-18 20:54 ` Sage Weil
  2009-12-18 21:38   ` Linus Torvalds
@ 2009-12-19  5:33   ` Valdis.Kletnieks
  2009-12-21 16:42     ` Sage Weil
  1 sibling, 1 reply; 10+ messages in thread
From: Valdis.Kletnieks @ 2009-12-19  5:33 UTC (permalink / raw)
  To: Sage Weil; +Cc: torvalds, akpm, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 697 bytes --]

On Fri, 18 Dec 2009 12:54:02 PST, Sage Weil said:
> I would still like to see ceph merged for 2.6.33.  It's certainly not 
> production ready, but it would be greatly beneficial to be in mainline for 
> the same reasons other file systems like btrfs and exofs were merged 
> early.

Is the on-the-wire protocol believed to be correct, complete, and stable? How
about any userspace APIs and on-disk formats? In other words..

> > The git tree includes the full patchset posted in October and incremental
> > changes since then.  I've tried to cram in all the anticipated protocol
> > changes, but the file system is still strictly EXPERIMENTAL and is marked

Anything left dangling on the changes?

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-19  5:33   ` Valdis.Kletnieks
@ 2009-12-21 16:42     ` Sage Weil
  2009-12-21 18:04       ` Andreas Dilger
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2009-12-21 16:42 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: torvalds, akpm, linux-kernel, linux-fsdevel

On Sat, 19 Dec 2009, Valdis.Kletnieks@vt.edu wrote:

> On Fri, 18 Dec 2009 12:54:02 PST, Sage Weil said:
> > I would still like to see ceph merged for 2.6.33.  It's certainly not 
> > production ready, but it would be greatly beneficial to be in mainline for 
> > the same reasons other file systems like btrfs and exofs were merged 
> > early.
> 
> Is the on-the-wire protocol believed to be correct, complete, and stable? How
> about any userspace APIs and on-disk formats? In other words..
> 
> > > The git tree includes the full patchset posted in October and incremental
> > > changes since then.  I've tried to cram in all the anticipated protocol
> > > changes, but the file system is still strictly EXPERIMENTAL and is marked
> 
> Anything left dangling on the changes?

The wire protocol is close.  There is a corner cases with MDS failure 
recovery that need attention, but it can be resolved in a backward 
compatible way.  I think a compat/incompat flags mechanism during the 
initial handshake might be appropriate to make changes easier going 
forward.  I don't anticipate any other changes there.

There are some as-yet unresolved interface and performance issues with the 
way the storage nodes interact with btrfs that have on disk format 
implications.  I hope to resolve those shortly.  Those of course do not 
impact the client code.

sage

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] Ceph distributed file system client for 2.6.33
  2009-12-21 16:42     ` Sage Weil
@ 2009-12-21 18:04       ` Andreas Dilger
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Dilger @ 2009-12-21 18:04 UTC (permalink / raw)
  To: Sage Weil; +Cc: Valdis.Kletnieks, torvalds, akpm, linux-kernel, linux-fsdevel

On 2009-12-21, at 09:42, Sage Weil wrote:
> I think a compat/incompat flags mechanism during the
> initial handshake might be appropriate to make changes easier going
> forward.

Having compat/incompat flags for the network protocol, implemented  
correctly, is really critical for long term maintenance.  For Lustre,  
we ended up using a single set of compatibility flags:
- client sends full set of features that it understands
- server replies with strict subset of flags that it also understands
   (i.e. client_features & server_supported_features)
- if client doesn't have required support for a feature needed by the
   server, server refuses to allow client to mount
- if server doesn't have feature required by client (e.g. understands  
only
   some older implementation no longer supported by client), client  
refuses
   to mount filesystem

We've been able to use this mechanism for the past 5 years to maintain  
protocol interoperability for Lustre, though we don't promise  
perpetual interoperability, only for about 3 years or so before users  
have to upgrade to a newer release.  That allows us to drop support  
for ancient code instead of having to carry around baggage for every  
possible combination of old features.

Using simple version numbers for the protocol means you have to carry  
the baggage of every single previous version, and it isn't possible to  
have "experimental" features that are out in the wild, but eventually  
don't make sense to keep around forever.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-02-09 20:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-07 23:25 [GIT PULL] Ceph distributed file system client for 2.6.33 Sage Weil
2009-12-18 20:54 ` Sage Weil
2009-12-18 21:38   ` Linus Torvalds
2009-12-18 23:15     ` Jim Garlick
2009-12-19 11:01       ` Andi Kleen
2009-12-21 16:42         ` Sage Weil
2010-02-09 20:43     ` Josef Bacik
2009-12-19  5:33   ` Valdis.Kletnieks
2009-12-21 16:42     ` Sage Weil
2009-12-21 18:04       ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).