* libcephfs create file with layout and replication @ 2012-11-17 20:13 Noah Watkins 2012-11-17 21:35 ` Josh Durgin 2012-11-17 23:23 ` Sage Weil 0 siblings, 2 replies; 11+ messages in thread From: Noah Watkins @ 2012-11-17 20:13 UTC (permalink / raw) To: ceph-devel; +Cc: Sage Weil The Hadoop VFS layer assumes that block size and replication can be set on a per-file basis, which is important to users for file layout/workload optimizations. The libcephfs interface doesn't make this entirely easy. Here is one approach, but it isn't thread safe as the default values are global variables in the client. orig_obj_size = ceph_get_default_object_size() //save set_default_object_size(new size) open(path, O_CREAT) set_default_object_size(new size) //reset Something more convenient might be: ceph_open_layout(path, flags, mode, layout, replication) where layout and replication are used with O_CREAT | O_EXCL, or and interface for setting these values explicitly on newly created files: ceph_open(path, O_CREAT|O_EXCL) ceph_set_layout(path, layout, replication) where ceph_set_layout would succeed ostensibly on zero-length files. Any thoughts on how to handle this? Thanks, Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-17 20:13 libcephfs create file with layout and replication Noah Watkins @ 2012-11-17 21:35 ` Josh Durgin 2012-11-17 23:23 ` Sage Weil 1 sibling, 0 replies; 11+ messages in thread From: Josh Durgin @ 2012-11-17 21:35 UTC (permalink / raw) To: Noah Watkins; +Cc: ceph-devel, Sage Weil On 11/17/2012 12:13 PM, Noah Watkins wrote: > The Hadoop VFS layer assumes that block size and replication can be > set on a per-file basis, which is important to users for file > layout/workload optimizations. > > The libcephfs interface doesn't make this entirely easy. Here is one > approach, but it isn't thread safe as the default values are global > variables in the client. > > orig_obj_size = ceph_get_default_object_size() //save > set_default_object_size(new size) > open(path, O_CREAT) > set_default_object_size(new size) //reset > > Something more convenient might be: > > ceph_open_layout(path, flags, mode, layout, replication) I think this makes the most sense, since changing the layout of a file after it's been created can't happen, and this interface makes that the most clear. It also avoids maintaining extra state in libcephfs between calls. Since replication count is a per-pool setting, I think the hadoop bindings would have to translate from a vfs request to a pool with the requested replication level. So something like this, where layout is a struct containing stripe unit, stripe count, and object size (the subset of struct ceph_file_layout related to objects that's useful currently): ceph_open_layout(path, flags, mode, layout, pool_name) BTW, for anyone interested, there's a nice description of the layout parameters here: http://ceph.com/docs/master/dev/file-striping/ > where layout and replication are used with O_CREAT | O_EXCL, or and > interface for setting these values explicitly on newly created files: > > ceph_open(path, O_CREAT|O_EXCL) > ceph_set_layout(path, layout, replication) > > where ceph_set_layout would succeed ostensibly on zero-length files. > > Any thoughts on how to handle this? > > Thanks, > Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-17 20:13 libcephfs create file with layout and replication Noah Watkins 2012-11-17 21:35 ` Josh Durgin @ 2012-11-17 23:23 ` Sage Weil 2012-11-17 23:58 ` Noah Watkins 1 sibling, 1 reply; 11+ messages in thread From: Sage Weil @ 2012-11-17 23:23 UTC (permalink / raw) To: Noah Watkins; +Cc: ceph-devel On Sat, 17 Nov 2012, Noah Watkins wrote: > The Hadoop VFS layer assumes that block size and replication can be > set on a per-file basis, which is important to users for file > layout/workload optimizations. > > The libcephfs interface doesn't make this entirely easy. Here is one > approach, but it isn't thread safe as the default values are global > variables in the client. > > orig_obj_size = ceph_get_default_object_size() //save > set_default_object_size(new size) > open(path, O_CREAT) > set_default_object_size(new size) //reset > > Something more convenient might be: > > ceph_open_layout(path, flags, mode, layout, replication) > > where layout and replication are used with O_CREAT | O_EXCL, or and > interface for setting these values explicitly on newly created files: > > ceph_open(path, O_CREAT|O_EXCL) > ceph_set_layout(path, layout, replication) This is basically what we have now... at least that's how things work for the kernel client. We should make sure there is a clean way via libcephfs to do that. The client/mds protocol also allows you to specify the layout on file creation. This is better since it has one less round trip to the MDS. Let's just create a new open call with those additional arguments. FWIW, the striping parameters are object size, stripe unit, stripe count, and data pool. sage > > where ceph_set_layout would succeed ostensibly on zero-length files. > > Any thoughts on how to handle this? > > Thanks, > Noah > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-17 23:23 ` Sage Weil @ 2012-11-17 23:58 ` Noah Watkins 2012-11-18 0:15 ` Sage Weil 0 siblings, 1 reply; 11+ messages in thread From: Noah Watkins @ 2012-11-17 23:58 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Sat, Nov 17, 2012 at 3:23 PM, Sage Weil <sage@inktank.com> wrote: > On Sat, 17 Nov 2012, Noah Watkins wrote: > > FWIW, the striping parameters are object size, stripe unit, stripe count, > and data pool. In ceph_mds_request_args.open I see the all the striping parameters except data pool, and I don't see any places that the file_replication parameter is being used. Should a pg_pool field be added? -Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-17 23:58 ` Noah Watkins @ 2012-11-18 0:15 ` Sage Weil 2012-11-18 1:20 ` Noah Watkins 0 siblings, 1 reply; 11+ messages in thread From: Sage Weil @ 2012-11-18 0:15 UTC (permalink / raw) To: Noah Watkins; +Cc: ceph-devel On Sat, 17 Nov 2012, Noah Watkins wrote: > On Sat, Nov 17, 2012 at 3:23 PM, Sage Weil <sage@inktank.com> wrote: > > On Sat, 17 Nov 2012, Noah Watkins wrote: > > > > FWIW, the striping parameters are object size, stripe unit, stripe count, > > and data pool. > > In ceph_mds_request_args.open I see the all the striping parameters > except data pool, and I don't see any places that the file_replication > parameter is being used. Should a pg_pool field be added? Yeah, I think this bit needs to be fixed in the on-write protocol. That is a delicate fix. We ignore that for the purposes of getting the libcephfs API correct, though... sage ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-18 0:15 ` Sage Weil @ 2012-11-18 1:20 ` Noah Watkins 2012-11-18 20:05 ` Noah Watkins 0 siblings, 1 reply; 11+ messages in thread From: Noah Watkins @ 2012-11-18 1:20 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil <sage@inktank.com> wrote: > > We ignore that for the purposes of getting the libcephfs API correct, > though... Ok, make sense. Thanks. Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-18 1:20 ` Noah Watkins @ 2012-11-18 20:05 ` Noah Watkins 2012-11-20 1:04 ` Gregory Farnum 0 siblings, 1 reply; 11+ messages in thread From: Noah Watkins @ 2012-11-18 20:05 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Wanna have a look at a first pass on this patch? wip-client-open-layout Thanks, Noah On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins <jayhawk@cs.ucsc.edu> wrote: > On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil <sage@inktank.com> wrote: >> >> We ignore that for the purposes of getting the libcephfs API correct, >> though... > > Ok, make sense. Thanks. > > Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-18 20:05 ` Noah Watkins @ 2012-11-20 1:04 ` Gregory Farnum 2012-11-20 2:48 ` Noah Watkins 0 siblings, 1 reply; 11+ messages in thread From: Gregory Farnum @ 2012-11-20 1:04 UTC (permalink / raw) To: Noah Watkins; +Cc: Sage Weil, ceph-devel On Sun, Nov 18, 2012 at 12:05 PM, Noah Watkins <jayhawk@cs.ucsc.edu> wrote: > Wanna have a look at a first pass on this patch? > > wip-client-open-layout > > Thanks, > Noah Just glanced over this, and I'm curious: 1) Why symlink another reference to your file_layout.h? 2) There's already a ceph_file_layout struct which is used "widely" (MDS, kernel, userspace client). It also has an accompanying function that does basic validity checks. > On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins <jayhawk@cs.ucsc.edu> wrote: >> On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil <sage@inktank.com> wrote: >>> >>> We ignore that for the purposes of getting the libcephfs API correct, >>> though... >> >> Ok, make sense. Thanks. >> >> Noah FYI, there's an "unused" __le32 in the open struct (used to be for preferred PG). We should be able to steal that away without too much pain or massaging! :) -Greg ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-20 1:04 ` Gregory Farnum @ 2012-11-20 2:48 ` Noah Watkins 2012-11-20 3:28 ` Sage Weil 0 siblings, 1 reply; 11+ messages in thread From: Noah Watkins @ 2012-11-20 2:48 UTC (permalink / raw) To: Gregory Farnum; +Cc: Sage Weil, ceph-devel On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum <greg@inktank.com> wrote: > > Just glanced over this, and I'm curious: > 1) Why symlink another reference to your file_layout.h? I followed the same pattern as page.h in librados, but may have misunderstood its use. When libcephfs.h is installed, it includes #include "file_layout.h" and we assume the user has -Iprefix/cephfs/. but in the build tree, include/cephfs isn't an includes path used, hence the symlink. > 2) There's already a ceph_file_layout struct which is used "widely" > (MDS, kernel, userspace client). It also has an accompanying function > that does basic validity checks. I avoided ceph_file_layout because I was under the impression that all of the __le64 stuff in it was very much Linux-specific. I had run into a lot of this hacking on an OSX port. > FYI, there's an "unused" __le32 in the open struct (used to be for > preferred PG). We should be able to steal that away without too much > pain or massaging! :) Nice. Do you think I should revert back to using ceph_file_layout? Thanks, Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-20 2:48 ` Noah Watkins @ 2012-11-20 3:28 ` Sage Weil 2012-11-20 21:59 ` Noah Watkins 0 siblings, 1 reply; 11+ messages in thread From: Sage Weil @ 2012-11-20 3:28 UTC (permalink / raw) To: Noah Watkins; +Cc: Gregory Farnum, ceph-devel On Mon, 19 Nov 2012, Noah Watkins wrote: > On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum <greg@inktank.com> wrote: > > > > Just glanced over this, and I'm curious: > > 1) Why symlink another reference to your file_layout.h? > > I followed the same pattern as page.h in librados, but may have > misunderstood its use. When libcephfs.h is installed, it includes > > #include "file_layout.h" > > and we assume the user has -Iprefix/cephfs/. > > but in the build tree, include/cephfs isn't an includes path used, > hence the symlink. > > > 2) There's already a ceph_file_layout struct which is used "widely" > > (MDS, kernel, userspace client). It also has an accompanying function > > that does basic validity checks. > > I avoided ceph_file_layout because I was under the impression that all > of the __le64 stuff in it was very much Linux-specific. I had run into > a lot of this hacking on an OSX port. > > > FYI, there's an "unused" __le32 in the open struct (used to be for > > preferred PG). We should be able to steal that away without too much > > pain or massaging! :) > > Nice. Do you think I should revert back to using ceph_file_layout? We could avoid the whole issue by passing 4 arguments to the function... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: libcephfs create file with layout and replication 2012-11-20 3:28 ` Sage Weil @ 2012-11-20 21:59 ` Noah Watkins 0 siblings, 0 replies; 11+ messages in thread From: Noah Watkins @ 2012-11-20 21:59 UTC (permalink / raw) To: Sage Weil; +Cc: Gregory Farnum, ceph-devel On Mon, Nov 19, 2012 at 7:28 PM, Sage Weil <sage@inktank.com> wrote: > > We could avoid the whole issue by passing 4 arguments to the function... I pushed a new patch that takes each of the 4 new arguments. wip-client-open-layout Thanks, -Noah ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-11-20 21:59 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-17 20:13 libcephfs create file with layout and replication Noah Watkins 2012-11-17 21:35 ` Josh Durgin 2012-11-17 23:23 ` Sage Weil 2012-11-17 23:58 ` Noah Watkins 2012-11-18 0:15 ` Sage Weil 2012-11-18 1:20 ` Noah Watkins 2012-11-18 20:05 ` Noah Watkins 2012-11-20 1:04 ` Gregory Farnum 2012-11-20 2:48 ` Noah Watkins 2012-11-20 3:28 ` Sage Weil 2012-11-20 21:59 ` Noah Watkins
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.