* Re: ceph/hadoop benchmarking
2011-10-26 21:34 ` ceph/hadoop benchmarking Sage Weil
@ 2011-10-26 21:57 ` Gregory Farnum
2011-10-26 22:08 ` Noah Watkins
2011-10-26 22:36 ` Tommi Virtanen
2011-12-13 16:30 ` Noah Watkins
2 siblings, 1 reply; 9+ messages in thread
From: Gregory Farnum @ 2011-10-26 21:57 UTC (permalink / raw)
To: Noah Watkins, ceph-devel, Sage Weil
On Wed, Oct 26, 2011 at 2:34 PM, Sage Weil <sage@newdream.net> wrote:
> [adding ceph-devel CC]
>
> On Wed, 26 Oct 2011, Noah Watkins wrote:
>> ----- Original Message -----
>> > From: "Sage Weil" <sage@newdream.net>
>> >
>> > >
>> > > There was some packaging/cleanup work mentioned a while back on the
>> > > mailing list. Is anyone working on this?
>> >
>> > Nope.
>> >
>> > I'm not really sure what the "right' way to build/expose/package java
>> > bindings is.. but presumably it's less annoying than what we have now
>> > :)
>>
>> I'm revisiting this now to prepare packages and instructions for the
>> students running benchmarks this quarter, but I wanted to get your
>> input on future goals to not waste too much effort doing short-term
>> stuff. The possible approaches I see:
>>
>> 1) Everything (JNI wrappers + Java) lives in the Ceph tree and build
>> instructions include applying a patch to Hadoop.
>>
>> The current solution, and isn't too bad but needs documentation. This
>> approach can be simplified to avoid patching Hadoop by integrating
>> Ceph-specific Java code using the CLASSPATH global variable.
>>
>> 2) Everything is sent to Hadoop upstream.
>>
>> This is convenient because the Hadoop infrastructure already has
>> the facilities for building and linking native code into Hadoop,
>> and the only depenendency then becomes a standard ceph-devel
>> installation.
>>
>> This was the approach taken with the kernel client version which
>> also included JNI code (for ioctl).
>>
>> 3) Only JNI wrappers live in the Ceph tree and push Java patch upstream.
>>
>> This could be better if it is anticipated that libceph will see a lot
>> of churn in the future, and we'd avoid pushing more changes upstream.
>
> #3 strikes me as the right approach, since there are potentially other
> Java users of libcephfs, and the Hadoop CephFileSystem class will be best
> maintained (and most usable) if it upstream. There will just be the
> initial pain of getting the packaging right for libcephfs so that it will
> work w/ hadoop out of the box. (I'm assuming that is possible.. e.g.
> apt-get install ceph, fire up hadoop with the proper config?)
>
> sage
So I'd agree that #3 should be our eventual goal, but if we want
proper Java libceph/librados bindings (and to use these in Hadoop)
there's a lot of work that needs to be done before we try and push the
Java code upstream. The current library is *not* generic and contains
a lot of Hadoop-specific application logic that will need to be pushed
back out to Java.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ceph/hadoop benchmarking
2011-10-26 21:57 ` Gregory Farnum
@ 2011-10-26 22:08 ` Noah Watkins
0 siblings, 0 replies; 9+ messages in thread
From: Noah Watkins @ 2011-10-26 22:08 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel, Sage Weil
----- Original Message -----
> From: "Gregory Farnum" <gregory.farnum@dreamhost.com>
>
> So I'd agree that #3 should be our eventual goal, but if we want
> proper Java libceph/librados bindings (and to use these in Hadoop)
> there's a lot of work that needs to be done before we try and push the
> Java code upstream. The current library is *not* generic and contains
> a lot of Hadoop-specific application logic that will need to be pushed
> back out to Java.
To make sure I understand the generic/Hadoop-specific code issue, the
current JNI wrappers contain a ceph_isdirectory routine, but libcephfs
only contains lstat, upon which ceph_isdirectory is built. So, these
need to be futher pulled apart.
Thanks for the feedback guys. This plan seems real good.
-Noah
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ceph/hadoop benchmarking
2011-10-26 21:34 ` ceph/hadoop benchmarking Sage Weil
2011-10-26 21:57 ` Gregory Farnum
@ 2011-10-26 22:36 ` Tommi Virtanen
2011-10-26 22:41 ` Noah Watkins
2011-12-13 16:30 ` Noah Watkins
2 siblings, 1 reply; 9+ messages in thread
From: Tommi Virtanen @ 2011-10-26 22:36 UTC (permalink / raw)
To: Sage Weil; +Cc: Noah Watkins, Gregory Farnum, ceph-devel
On Wed, Oct 26, 2011 at 14:34, Sage Weil <sage@newdream.net> wrote:
>> 1) Everything (JNI wrappers + Java) lives in the Ceph tree and build
>> instructions include applying a patch to Hadoop.
>>
>> The current solution, and isn't too bad but needs documentation. This
>> approach can be simplified to avoid patching Hadoop by integrating
>> Ceph-specific Java code using the CLASSPATH global variable.
From my experience with Hadoop, the CLASSPATH approach seems
definitely good enough, and gives us more flexibility.
I don't know very much about the current patch in the Apache tracker,
but there really should be no need to patch the Hadoop upstream. Just
provide a jar with our filesystem plugin, change
fs.default.name=ceph://something/,
fs.ceph.impl=com.ceph.hadoop.CephFileSystem (whatever is in the jar).
http://hadoop.apache.org/common/docs/current/cluster_setup.html#Configuring+the+Hadoop+Daemons
http://hadoop.apache.org/common/docs/current/core-default.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ceph/hadoop benchmarking
2011-10-26 22:36 ` Tommi Virtanen
@ 2011-10-26 22:41 ` Noah Watkins
2011-10-26 22:43 ` Gregory Farnum
0 siblings, 1 reply; 9+ messages in thread
From: Noah Watkins @ 2011-10-26 22:41 UTC (permalink / raw)
To: Tommi Virtanen; +Cc: Gregory Farnum, ceph-devel, Sage Weil
----- Original Message -----
> From: "Tommi Virtanen" <tommi.virtanen@dreamhost.com>
>
> From my experience with Hadoop, the CLASSPATH approach seems
> definitely good enough, and gives us more flexibility.
I think that maybe the only thing that wouldn't work in this case
is integration into the Hadoop test harness. Other than that it
should work. Maybe there is a way to integrate?
-Noah
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ceph/hadoop benchmarking
2011-10-26 22:41 ` Noah Watkins
@ 2011-10-26 22:43 ` Gregory Farnum
2011-10-26 22:46 ` Noah Watkins
0 siblings, 1 reply; 9+ messages in thread
From: Gregory Farnum @ 2011-10-26 22:43 UTC (permalink / raw)
To: Noah Watkins, ceph-devel, Sage Weil
On Wed, Oct 26, 2011 at 3:41 PM, Noah Watkins <jayhawk@soe.ucsc.edu> wrote:
> ----- Original Message -----
>> From: "Tommi Virtanen" <tommi.virtanen@dreamhost.com>
>>
>> From my experience with Hadoop, the CLASSPATH approach seems
>> definitely good enough, and gives us more flexibility.
>
> I think that maybe the only thing that wouldn't work in this case
> is integration into the Hadoop test harness. Other than that it
> should work. Maybe there is a way to integrate?
My main concern with this approach as a long-term solution (after
we've solved our own teething issues) is the branching mess they have
going on right now, and the pain of maintaining multiple APIs. But if
everybody uses the Cloudera distro anyway, maybe this isn't actually a
big deal?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ceph/hadoop benchmarking
2011-10-26 22:43 ` Gregory Farnum
@ 2011-10-26 22:46 ` Noah Watkins
0 siblings, 0 replies; 9+ messages in thread
From: Noah Watkins @ 2011-10-26 22:46 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel, Sage Weil
----- Original Message -----
> From: "Gregory Farnum" <gregory.farnum@dreamhost.com>
>
> My main concern with this approach as a long-term solution (after
> we've solved our own teething issues) is the branching mess they have
> going on right now, and the pain of maintaining multiple APIs. But if
> everybody uses the Cloudera distro anyway, maybe this isn't actually a
> big deal?
I would tend to agree. Once it's upstream there is many more people
helping to move it forward with the natural Hadoop churn/branching.
And, FWIW, people are probably move likely to have warm fuzzy feeling
that it's in trunk and already for their use.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ceph/hadoop benchmarking
2011-10-26 21:34 ` ceph/hadoop benchmarking Sage Weil
2011-10-26 21:57 ` Gregory Farnum
2011-10-26 22:36 ` Tommi Virtanen
@ 2011-12-13 16:30 ` Noah Watkins
2011-12-13 20:12 ` Tommi Virtanen
2 siblings, 1 reply; 9+ messages in thread
From: Noah Watkins @ 2011-12-13 16:30 UTC (permalink / raw)
To: Sage Weil; +Cc: Noah Watkins, Gregory Farnum, ceph-devel
Comments below
On 10/26/11 2:34 PM, Sage Weil wrote:
> [adding ceph-devel CC]
>
> On Wed, 26 Oct 2011, Noah Watkins wrote:
>> ----- Original Message -----
>>> From: "Sage Weil"<sage@newdream.net>
>>>
>>>>
>>>> There was some packaging/cleanup work mentioned a while back on the
>>>> mailing list. Is anyone working on this?
>>>
>>> Nope.
>>>
>>> I'm not really sure what the "right' way to build/expose/package java
>>> bindings is.. but presumably it's less annoying than what we have now
>>> :)
>>
>> I'm revisiting this now to prepare packages and instructions for the
>> students running benchmarks this quarter, but I wanted to get your
>> input on future goals to not waste too much effort doing short-term
>> stuff. The possible approaches I see:
>>
>> 1) Everything (JNI wrappers + Java) lives in the Ceph tree and build
>> instructions include applying a patch to Hadoop.
>>
>> The current solution, and isn't too bad but needs documentation. This
>> approach can be simplified to avoid patching Hadoop by integrating
>> Ceph-specific Java code using the CLASSPATH global variable.
>>
>> 2) Everything is sent to Hadoop upstream.
>>
>> This is convenient because the Hadoop infrastructure already has
>> the facilities for building and linking native code into Hadoop,
>> and the only depenendency then becomes a standard ceph-devel
>> installation.
>>
>> This was the approach taken with the kernel client version which
>> also included JNI code (for ioctl).
>>
>> 3) Only JNI wrappers live in the Ceph tree and push Java patch upstream.
>>
>> This could be better if it is anticipated that libceph will see a lot
>> of churn in the future, and we'd avoid pushing more changes upstream.
>
> #3 strikes me as the right approach, since there are potentially other
> Java users of libcephfs, and the Hadoop CephFileSystem class will be best
> maintained (and most usable) if it upstream. There will just be the
> initial pain of getting the packaging right for libcephfs so that it will
> work w/ hadoop out of the box. (I'm assuming that is possible.. e.g.
> apt-get install ceph, fire up hadoop with the proper config?)
A couple comments about libceph-java: After looking through a bunch of
Debian Java packages, it seems a common approach to packaging JNI/Java
code is using the scheme:
libcephfs-jni --> .so
libcephfs-java --> .jar
The debhelper tool and friends seem to do a good job of packaging up the
Java in this way, but integrating this into Ceph's default Debian
scripts means that anyone would now need a JDK to build Ceph with
dpkg-buildpackages.
Is there a way to parametrize the Debian build process so people who
don't care about Java bindings can proceed? An alternative approach is
to say have another set of Debian build scripts in src/client/java/debian.
Thanks,
Noah
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ceph/hadoop benchmarking
2011-12-13 16:30 ` Noah Watkins
@ 2011-12-13 20:12 ` Tommi Virtanen
0 siblings, 0 replies; 9+ messages in thread
From: Tommi Virtanen @ 2011-12-13 20:12 UTC (permalink / raw)
To: Noah Watkins; +Cc: Sage Weil, Noah Watkins, Gregory Farnum, ceph-devel
On Tue, Dec 13, 2011 at 08:30, Noah Watkins <jayhawk@cs.ucsc.edu> wrote:
> The debhelper tool and friends seem to do a good job of packaging up the
> Java in this way, but integrating this into Ceph's default Debian scripts
> means that anyone would now need a JDK to build Ceph with
> dpkg-buildpackages.
>
> Is there a way to parametrize the Debian build process so people who don't
> care about Java bindings can proceed? An alternative approach is to say have
> another set of Debian build scripts in src/client/java/debian.
The only parametrization I'm aware of is DEB_BUILD_OPTIONS, and it's
not really meant to be used for toggling off bits of functionality.
And if we would support DEB_BUILD_OPTIONS=nojava, would that mean not
listing the java sdk in Build-Dependencies? That sounds like a bad
road to take.
The clean options I see are:
1. require anyone building the debs to have the java sdk debs installed
2. split the java bindings into a separate source package
Frankly, I think 2. is decent option. Make ceph provide libcephfs-dev
etc, make that a build-dependency of the java bindings. It complicates
"build everything" but simplifies the common case. It lets these
things evolve at different rates. It acts as a good check on whether
we did a good job on libcephfs-dev packaging and API/ABI changes.
I personally dislike it a lot when some upstream project tries to
maintain all the language bindings in the world in the single master
repository. That leads to a "one size fits nobody" world. For example,
even our python bindings are *miserably* maintained compared to most
true-Python projects; for example, they are not PyPI/pip-friendly. The
reason why I've been ok with the python bindings is we use them
internally, for tests etc, and I want to push more in that direction..
but we already have the PHP bindings in a separate repo. We don't need
all the worlds bindings in ceph.git.
^ permalink raw reply [flat|nested] 9+ messages in thread