Libification project (SoC)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Libification project (SoC)
@ 2007-03-16  4:24 Luiz Fernando N. Capitulino
  2007-03-16  4:59 ` Shawn O. Pearce
  2007-03-17  2:24 ` Jakub Narebski
  0 siblings, 2 replies; 62+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-03-16  4:24 UTC (permalink / raw)
  To: gsoc; +Cc: git

 Hi Shawn,

 I'm going to apply for the libification project and, in order to help
me to get started, would be good to get some feedback regarding the
project's goal and your expectations.

 I'll just dump some thoughts/question I had, so that we can
start some discussion.

 1. This' a more complete todo list, based on the wiki and a
quick look at the code.

    o Remove static variables
    o Avoid dying when a function call fails (eg, malloc())
    o Input parameter checking (plus errno setting)
    o Documentation (eg, doxygen)
    o Unit-tests
    o Add prefix (eg, git_*) to public API functions

 Do we agree here? Is there more suggestions?

 2. What's the minimum amount of work that need to be done for
the SoC project to be considered successful?

 3. I don't code in Perl, is it a problem? I mean, the project's
goal is to have a Perl binding but I think it goes far from
that: we could have a python module, a C program, or anything
that shows the libgit is useful.

 Thanks,

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  4:24 Libification project (SoC) Luiz Fernando N. Capitulino
@ 2007-03-16  4:59 ` Shawn O. Pearce
  2007-03-16  5:30   ` Junio C Hamano
                     ` (2 more replies)
  2007-03-17  2:24 ` Jakub Narebski
  1 sibling, 3 replies; 62+ messages in thread
From: Shawn O. Pearce @ 2007-03-16  4:59 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino; +Cc: git

"Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> wrote:
>  I'm going to apply for the libification project and, in order to help
> me to get started, would be good to get some feedback regarding the
> project's goal and your expectations.

Excellent!

>  1. This' a more complete todo list, based on the wiki and a
> quick look at the code.
> 
>     o Remove static variables

Yes.  Removing all of these is not completely necessary in the
first version; in fact I would recommened against it.

For example the active_cache variable and its related friends
is referenced a lot. lt contains the index in memory.  I think
its perfectly OK to say that in the first iteration of a public
libgit.a that the process may only use one index at a time, if it
can even use the index at all (see below).  But if you eventually
got around to even helping the index parts of "the Git library",
that would certainly be appreciated!

On the other hand, many of the variables declared in environment.c
are repository specific configuration variables.  These probably
should be abstracted into some sort of wrapper, so that multiple
repositories can be accessed from within the same process.  Why?
a future mod_perl running gitweb.cgi accessing repositories through
libgit.a and Perl bindings of course!

But static variable removal is low on the priority list for this
project I think.  Our more important issues are related to some of
the other items.

>     o Avoid dying when a function call fails (eg, malloc())

malloc is a huge problem in the Git code today.  Almost all
of our malloc calls are actually through the xmalloc wrapper.
All xmalloc callers assume xmalloc will *never* fail.  This
makes it, uh, interesting.  ;-)

Although one could argue that being unable to malloc needed memory
probably means you're toast, so die()'ing is good.

But other areas die when they get given a bad SHA-1 (for example).
If the library caller can supply that (possibly bad) SHA-1 to an
API function, that's just mean to die out.  ;-)

>     o Input parameter checking (plus errno setting)

Yes, of course.  But most functions (at least those that should be
made public) probably already do check their arguments.  Some return
an error code back to their caller; others die() and abort the
current process.  And there are probably a few that don't check
their arguments enough.  But I think input parameter checking is
probably going to be a relatively small task here.

Although sometimes the input checking is done in the program that
calls the function, and not the function itself.  So that might
need to be refactored in a few spots.

>     o Documentation (eg, doxygen)

Yes; very important for the library to be of any use to anyone else.

>     o Unit-tests

Of the public API, yes.  Our current test suite covers some of that
code that we want to make public, but does so through programs that
call those functions.  We would want unit tests to verify the public
API conforms to the expectations of the unit test's writer.  ;-)

>     o Add prefix (eg, git_*) to public API functions

Yes.  But which functions shall we expose?  ;-)

See below for functionality I'm thinking about; others may have
different ideas.

      o Build system issues

You missed this, but I think its an important consideration.
Our current libgit.a is a static library that has a relatively large
number of symbols its modules are exporting.  These symbol names
aren't namespace-ized (e.g. git_* prefix) so we wouldn't want to
just offer this library up in its current form.

Some of those symbols would get name changes (as you suggest above),
but others might not (e.g. the active_cache that I suggest further
above).  These modules might need to be moved out of libgit.a and
moved into say a new libgitprivate.a, that our own code can link
against, but that isn't offered to the public as a stable API.

      o Public header definition

Whatever we expose, we will need to draft a public "git.h"
(or somesuch) that callers can rely upon.  It will need to be
fairly stable, and handle revisions as new features get added.
E.g. version testing support like the zlib and cURL library have,
and that we rely upon in Git to do feature checks.  ;-)

>  2. What's the minimum amount of work that need to be done for
> the SoC project to be considered successful?

I'd like to see enough API support that gitweb.cgi could:

 * get the most recent commit date of all refs in all projects
   (the toplevel project index page);
 * get a shortlog for the main summary page of a project;
 * get the full content of a single commit;
 * get the "raw" diff (paths that changed) for two commits;

There's a thousand other things that gitweb.cgi would still need to
fully avoid forking Git processes.  But that's a really good start,
and is probably going to be a decent chunk of work.  Especially to
create high-quality patches that pass our standards review.  ;-)

In some cases much of the above is already "internally public";
meaning we already treat parts of that code as a library and invoke
them from within processes to get work done.  Much of this project
is about improving the interfaces and behavior enough to make those
existing APIs truely public.

See refs.h, diff.h, revision.h, commit.h...

>  3. I don't code in Perl, is it a problem? I mean, the project's
> goal is to have a Perl binding but I think it goes far from
> that: we could have a python module, a C program, or anything
> that shows the libgit is useful.

No, I don't see that as a problem at all.  We have some Perl
experts on the mailing list who would like to see Perl bindings.
Some of the Perl binding is pure C code, and some if it is this
weird Perl macro language...  so I expect those Perl experts to come
out of the woodwork and help the community to create a prototype
set of bindings.  There's also Ruby and Python interests around,
so we may see bindings for those too.  ;-)

>From a goal perspective of this SoC project, any functioning binding
that can support a gitweb type of application would be great.
It shows the library works as intended, is useful, and can be
continued to be built upon.  That's a pretty successful project in
my mind.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  4:59 ` Shawn O. Pearce
@ 2007-03-16  5:30   ` Junio C Hamano
  2007-03-16  6:00     ` Shawn O. Pearce
                       ` (2 more replies)
  2007-03-16  8:06   ` Johannes Sixt
  2007-03-16 12:55   ` Petr Baudis
  2 siblings, 3 replies; 62+ messages in thread
From: Junio C Hamano @ 2007-03-16  5:30 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Luiz Fernando N. Capitulino, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> On the other hand, many of the variables declared in environment.c
> are repository specific configuration variables.  These probably
> should be abstracted into some sort of wrapper, so that multiple
> repositories can be accessed from within the same process.  Why?
> a future mod_perl running gitweb.cgi accessing repositories through
> libgit.a and Perl bindings of course!

I think if you are abstracting them out, into "struct repo_state",
the index and object store related variables such as packed_git
should go there as well, so your recommendation feels very
inconsistent to me.

>>     o Avoid dying when a function call fails (eg, malloc())
>
> malloc is a huge problem in the Git code today.  Almost all
> of our malloc calls are actually through the xmalloc wrapper.
> All xmalloc callers assume xmalloc will *never* fail.  This
> makes it, uh, interesting.  ;-)

Actually they do not assume such.  What they assume is worse.
They assume that there is nothing else you can do other than
dying when allocation fails.

> But other areas die when they get given a bad SHA-1 (for example).
> If the library caller can supply that (possibly bad) SHA-1 to an
> API function, that's just mean to die out.  ;-)

That's a real problem, but on the other hand, perl or whatever
wrapped ones can do the dying (or not dying) before calling into
libgit, so it may not be such a big issue.

>>     o Documentation (eg, doxygen)
>>     o Unit-tests
>>     o Add prefix (eg, git_*) to public API functions
>
> Yes.  But which functions shall we expose?  ;-)

Before going into that topic, a bigger question is if we are
happy with the current internal API and what the goal of
libification is.  If the libification is going to say that "this
is a published API so we are not going to change it", I would
imagine that it would be very hard to accept in the mainline.
Improvements like the earlier sliding mmap() series need to be
able to change the interfaces without backward compatibility
wart.

In other words, I do not know what idiot ^W ^W who listed the
libification stuff on the SoC "ideas" page, but I think (1) it
is premature to promise stable ABI, and (2) if it does not
promise stable ABI a library is not very useful.

>       o Build system issues
>
> You missed this, but I think its an important consideration.
> Our current libgit.a is a static library that has a relatively large
> number of symbols its modules are exporting.  These symbol names
> aren't namespace-ized (e.g. git_* prefix) so we wouldn't want to
> just offer this library up in its current form.

Very true, in fact, the current libgit.a is _NOT_ a library at
all.  It is just a way to be terse in our Makefile to make the
linker do the work for us, nothing more.

And I do not think we would want to rename our "internally
public" functions such as find_pack_entry_one() and
sha1_object_info() with git_ prefix only for the purpose of this
libification.

If we can trick the linker to create gitlib.so which defines the
symbol git_sha1_object_info() that lets the caller to call our
internal sha1_object_info(), without exposing the internal name
sha1_object_info(), and strip other global names libgit.a and
plumbing internally use to communicate each other, such as
find_pack_entry_one(), from the gitlib.so library, that would be
a good solution.

>>  2. What's the minimum amount of work that need to be done for
>> the SoC project to be considered successful?
>
> I'd like to see enough API support that gitweb.cgi could:
>
>  * get the most recent commit date of all refs in all projects
>    (the toplevel project index page);
>  * get a shortlog for the main summary page of a project;
>  * get the full content of a single commit;
>  * get the "raw" diff (paths that changed) for two commits;

I would disagree with tying libification and Perl binding this
way.  If the goal is to get faster gitweb, then that does not
necessarily have to be libified git.  Let one person who does
the libification come up with a decent C binding and let others
worry about Perl bindings.

> In some cases much of the above is already "internally public";
> meaning we already treat parts of that code as a library and invoke
> them from within processes to get work done.  Much of this project
> is about improving the interfaces and behavior enough to make those
> existing APIs truely public.

One big thing you forgot to mention is that whatever form it
takes, the libification should not impact performance of
existing plumbing.  These interfaces are "internally" public
exactly because the callers still honor underlying convention
such as not having to clean-up the object flags for the last
invocation.  If you libify in a wrong way, you would end up an
implementation of the interface that always cleans up (because
you would not know if you are part of a long-living process so
you will clean-up just in case you will still be called later),
which would be unusable from the plumbing point-of-view.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  5:30   ` Junio C Hamano
@ 2007-03-16  6:00     ` Shawn O. Pearce
  2007-03-16  6:54       ` Junio C Hamano
  2007-03-16 12:53     ` Petr Baudis
  2007-03-16 13:47     ` Luiz Fernando N. Capitulino
  2 siblings, 1 reply; 62+ messages in thread
From: Shawn O. Pearce @ 2007-03-16  6:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Luiz Fernando N. Capitulino, git

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > On the other hand, many of the variables declared in environment.c
> > are repository specific configuration variables.  These probably
> > should be abstracted into some sort of wrapper, so that multiple
> > repositories can be accessed from within the same process.  Why?
> > a future mod_perl running gitweb.cgi accessing repositories through
> > libgit.a and Perl bindings of course!
> 
> I think if you are abstracting them out, into "struct repo_state",
> the index and object store related variables such as packed_git
> should go there as well, so your recommendation feels very
> inconsistent to me.

I missed packed_git, but you are right, that should definately go
with a struct repo_state.  And maybe you are right that the index
should go with it... but I'm not sure the index should be tied to the
repository at all.  Its strictly convention that the index goes with
the repository; GIT_INDEX_FILE lets you say otherwise at the command
line level, why can't we do otherwise from a library level too?

> >>     o Add prefix (eg, git_*) to public API functions
> >
> > Yes.  But which functions shall we expose?  ;-)
> 
> Before going into that topic, a bigger question is if we are
> happy with the current internal API and what the goal of
> libification is.  If the libification is going to say that "this
> is a published API so we are not going to change it", I would
> imagine that it would be very hard to accept in the mainline.

I'm looking at a middleground between our current "moving target"
internal API and our "frozen" plumbing process based API.  There
are a number of places where just being able to get data *out*
of Git easily would be useful, but doing so right now is awkward.
Either you code against our "moving target" internal API by creating
a new builtin (e.g. my builtin-statplog) where its easy to get what
you want, or you code against the plumbing based tools, where its
sometimes not so easy...

Most of the data formats aren't changing; a commit is a commit is
a commit.  It has a tree, parents, author, committer, message.

> Improvements like the earlier sliding mmap() series need to be
> able to change the interfaces without backward compatibility
> wart.

I agree.  But I also think the use_mmap() API is just way too low
level for a public library.  That particular change was pretty
low level.

Think higher, like "struct commit".  That is actually too low still,
as it doesn't really help you with the author and committer.

> In other words, I do not know what idiot ^W ^W who listed the
> libification stuff on the SoC "ideas" page,

I'm the idiot ^W individual responsible.  ;-)

> I would disagree with tying libification and Perl binding this
> way.  If the goal is to get faster gitweb, then that does not
> necessarily have to be libified git.  Let one person who does
> the libification come up with a decent C binding and let others
> worry about Perl bindings.

Yes.  However Perl bindings are often asked for.  And Marco Costalba
might like a working libgit that he could use for revision fetching
in qgit.  I think that if patches for a library started to appear,
another interested party would start to at least play with them.

> One big thing you forgot to mention is that whatever form it
> takes, the libification should not impact performance of
> existing plumbing.  These interfaces are "internally" public
> exactly because the callers still honor underlying convention
> such as not having to clean-up the object flags for the last
> invocation.  If you libify in a wrong way, you would end up an
> implementation of the interface that always cleans up (because
> you would not know if you are part of a long-living process so
> you will clean-up just in case you will still be called later),
> which would be unusable from the plumbing point-of-view.

I didn't forget; I just simply did not mention it.  I was considering
writing something to that effect, and probably should have.

This is a really valid point.  Git is insanely fast, partly because
we have a lot of "run once" types of applications and we have
optimized for those.  Any sort of "run many times" reuse needs to
not make the "run once" guy pay for something he will not use.

A good example of this is in git-describe, where we use the object
flags, and only bother to clear them out if there is another commit
remaining to be described.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  6:00     ` Shawn O. Pearce
@ 2007-03-16  6:54       ` Junio C Hamano
  2007-03-16 11:54         ` Johannes Schindelin
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2007-03-16  6:54 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Luiz Fernando N. Capitulino, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> "Shawn O. Pearce" <spearce@spearce.org> writes:
>> > On the other hand, many of the variables declared in environment.c
>> > are repository specific configuration variables.  These probably
>> > should be abstracted into some sort of wrapper, so that multiple
>> > repositories can be accessed from within the same process.  Why?
>> > a future mod_perl running gitweb.cgi accessing repositories through
>> > libgit.a and Perl bindings of course!
>> 
>> I think if you are abstracting them out, into "struct repo_state",
>> the index and object store related variables such as packed_git
>> should go there as well, so your recommendation feels very
>> inconsistent to me.
>
> I missed packed_git, but you are right, that should definately go
> with a struct repo_state.  And maybe you are right that the index
> should go with it... but I'm not sure the index should be tied to the
> repository at all.  Its strictly convention that the index goes with
> the repository; GIT_INDEX_FILE lets you say otherwise at the command
> line level, why can't we do otherwise from a library level too?

Even within a plumbing, being able to shuffle multiple indices
at once would be very useful.  For example, if I were to rewrite
unpack-trees, I would most likely read from the current index
and trees and populate a new index from emptiness by appending
to it, thereby avoiding the binary-search and insert costs.

I've thought about the layering when Smurf first brought up the
libification (which was a loooong time ago), and concluded three
layered approach would be most useful.

The bottom layer is object store across repositories.  If we
ignore SHA-1 collisions as an issue (and we _will_ ignore it for
forseeable future), unless you are doing "read from one
repository and write that to another repository", it is more
handy to be able to name an object and get its data without
knowing which repository's object store it comes from, and it
would make "git log master~A..master~B" across repositories
(i.e. 'master' of repository A and 'master' of repository B)
possible.  An example interface would be like:

(current)
void *read_sha1_file(const unsigned char *sha1,
		     enum object_type *type,
		     unsigned long *size);

(libified)
void *git_read_sha1_file(struct gitlib *,
			 const unsigned char *sha1,
			 enum object_type *type,
			 unsigned long *size);

where "struct gitlib" has a list of "struct object_store", and
we will have:

int git_add_object_store(struct gitlib *, const char *path);

to add one directory as object store the toplevel gitlib structure
knows about.  In a sense, "struct gitlib" and object store is so
global that we might not even need to have it as a parameter
(iow, it and "struct object **obj_hash" from object.c can stay
global).

The middle layer is repositories, primarily their refs and
reflogs.  An example interface would be like:

(current)
int get_sha1(const char *name, unsigned char *sha1);

(libified)
int git_get_sha1(struct git_repo *, const char *name, unsigned char *sha1);

where "struct git_repo" is one repository (and it would have a
pointer to "struct gitlib *" so that we can follow objects to
follow parents and stuff).

And the top layer would have indices, and working trees as
per-invocation parameter.

(current)
int cache_name_pos(const char *name, int namelen);
int unpack_trees(struct object_list *trees, struct unpack_trees_options *o);

(libified)
int git_cache_name_pos(struct git_cache *, const char *name, int namelen);
int git_unpack_trees(struct object_list *trees, struct git_unpack_trees_options *o);

where "struct git_cache" has "index" thingies, such as
active_cache, active_nr, active_alloc, and active_cache_tree.
And we would have pointer to "struct git_cache *" in unpack_trees_options
structure.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  4:59 ` Shawn O. Pearce
  2007-03-16  5:30   ` Junio C Hamano
@ 2007-03-16  8:06   ` Johannes Sixt
  2007-03-16  8:58     ` Matthieu Moy
  2007-03-16 12:55   ` Petr Baudis
  2 siblings, 1 reply; 62+ messages in thread
From: Johannes Sixt @ 2007-03-16  8:06 UTC (permalink / raw)
  To: git

"Shawn O. Pearce" wrote:
> "Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> wrote:
> >     o Avoid dying when a function call fails (eg, malloc())
> 
> malloc is a huge problem in the Git code today.  Almost all
> of our malloc calls are actually through the xmalloc wrapper.
> All xmalloc callers assume xmalloc will *never* fail.  This
> makes it, uh, interesting.  ;-)

You could think about longjmp(3)ing out into main(), which would have to
setjmp(3). But in order to clean up intermediate frames, you would have
to have a stack of setjmp/longjmp buffers.

Oh, well, how do I *love* them C++ exceptions!

-- Hannes

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  8:06   ` Johannes Sixt
@ 2007-03-16  8:58     ` Matthieu Moy
  2007-03-16 11:51       ` Johannes Schindelin
  0 siblings, 1 reply; 62+ messages in thread
From: Matthieu Moy @ 2007-03-16  8:58 UTC (permalink / raw)
  To: git

Johannes Sixt <J.Sixt@eudaptics.com> writes:

> You could think about longjmp(3)ing out into main(), which would have to
> setjmp(3). But in order to clean up intermediate frames, you would have
> to have a stack of setjmp/longjmp buffers.
>
> Oh, well, how do I *love* them C++ exceptions!

You can have exceptions in C too.

I've used it a bit while contributing to Baz 1.x (the fork of tla).
The library used was cexcept ( http://cexcept.sourceforge.net/ ).

As you mention, jumping is the easy part, and cleaning up is the hard
one. Baz was using talloc, hacked to somehow work with cexcept. The
mini-library doesn't seem to be available as a tarball anymore, so I
did the checkout+targz in case someone's curious to have a look, and
lazy enough not to install baz to get it:

http://www-verimag.imag.fr/~moy/tmp/talloc-except--2.0.1--patch-2.tar.gz

This stuff is not supported anymore, but very small anyway.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  8:58     ` Matthieu Moy
@ 2007-03-16 11:51       ` Johannes Schindelin
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-16 11:51 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

Hi,

On Fri, 16 Mar 2007, Matthieu Moy wrote:

> Johannes Sixt <J.Sixt@eudaptics.com> writes:
> 
> > You could think about longjmp(3)ing out into main(), which would have to
> > setjmp(3). But in order to clean up intermediate frames, you would have
> > to have a stack of setjmp/longjmp buffers.
> >
> > Oh, well, how do I *love* them C++ exceptions!
> 
> You can have exceptions in C too.
> 
> I've used it a bit while contributing to Baz 1.x (the fork of tla).
> The library used was cexcept ( http://cexcept.sourceforge.net/ ).
> 
> As you mention, jumping is the easy part, and cleaning up is the hard
> one. Baz was using talloc, hacked to somehow work with cexcept. The
> mini-library doesn't seem to be available as a tarball anymore, so I
> did the checkout+targz in case someone's curious to have a look, and
> lazy enough not to install baz to get it:
> 
> http://www-verimag.imag.fr/~moy/tmp/talloc-except--2.0.1--patch-2.tar.gz
> 
> This stuff is not supported anymore, but very small anyway.

I was thinking about a similar approach some time ago. But that means that 
you _must not_ have static variables that you rely on being initialised 
correctly.

I mean, we have xmalloc(), and it would be easy to enforce xfree(), too 
(which would be good for memory profiling anyway), and we _could_ hack 
that into tracking which pointers were returned after which checkpoint.

But we _cannot_ say which static variables should be initialised (and 
how), after some "exception" was thrown at a certain point.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  6:54       ` Junio C Hamano
@ 2007-03-16 11:54         ` Johannes Schindelin
  2007-03-16 13:09           ` Rocco Rutte
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-16 11:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Luiz Fernando N. Capitulino, git

Hi,

On Thu, 15 Mar 2007, Junio C Hamano wrote:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > Junio C Hamano <junkio@cox.net> wrote:
> >> "Shawn O. Pearce" <spearce@spearce.org> writes:
> >> > On the other hand, many of the variables declared in environment.c
> >> > are repository specific configuration variables.  These probably
> >> > should be abstracted into some sort of wrapper, so that multiple
> >> > repositories can be accessed from within the same process.  Why?
> >> > a future mod_perl running gitweb.cgi accessing repositories through
> >> > libgit.a and Perl bindings of course!
> >> 
> >> I think if you are abstracting them out, into "struct repo_state",
> >> the index and object store related variables such as packed_git
> >> should go there as well, so your recommendation feels very
> >> inconsistent to me.
> >
> > I missed packed_git, but you are right, that should definately go
> > with a struct repo_state.  And maybe you are right that the index
> > should go with it... but I'm not sure the index should be tied to the
> > repository at all.  Its strictly convention that the index goes with
> > the repository; GIT_INDEX_FILE lets you say otherwise at the command
> > line level, why can't we do otherwise from a library level too?
> 
> Even within a plumbing, being able to shuffle multiple indices
> at once would be very useful.  For example, if I were to rewrite
> unpack-trees, I would most likely read from the current index
> and trees and populate a new index from emptiness by appending
> to it, thereby avoiding the binary-search and insert costs.
> 
> I've thought about the layering when Smurf first brought up the
> libification (which was a loooong time ago), and concluded three
> layered approach would be most useful.
> 
> The bottom layer is object store across repositories.  If we
> ignore SHA-1 collisions as an issue (and we _will_ ignore it for
> forseeable future), unless you are doing "read from one
> repository and write that to another repository", it is more
> handy to be able to name an object and get its data without
> knowing which repository's object store it comes from, and it
> would make "git log master~A..master~B" across repositories
> (i.e. 'master' of repository A and 'master' of repository B)
> possible.  An example interface would be like:
> 
> (current)
> void *read_sha1_file(const unsigned char *sha1,
> 		     enum object_type *type,
> 		     unsigned long *size);
> 
> (libified)
> void *git_read_sha1_file(struct gitlib *,
> 			 const unsigned char *sha1,
> 			 enum object_type *type,
> 			 unsigned long *size);
> 
> where "struct gitlib" has a list of "struct object_store", and
> we will have:
> 
> int git_add_object_store(struct gitlib *, const char *path);
> 
> to add one directory as object store the toplevel gitlib structure
> knows about.  In a sense, "struct gitlib" and object store is so
> global that we might not even need to have it as a parameter
> (iow, it and "struct object **obj_hash" from object.c can stay
> global).
> 
> The middle layer is repositories, primarily their refs and
> reflogs.  An example interface would be like:
> 
> (current)
> int get_sha1(const char *name, unsigned char *sha1);
> 
> (libified)
> int git_get_sha1(struct git_repo *, const char *name, unsigned char *sha1);
> 
> where "struct git_repo" is one repository (and it would have a
> pointer to "struct gitlib *" so that we can follow objects to
> follow parents and stuff).
> 
> And the top layer would have indices, and working trees as
> per-invocation parameter.
> 
> (current)
> int cache_name_pos(const char *name, int namelen);
> int unpack_trees(struct object_list *trees, struct unpack_trees_options *o);
> 
> (libified)
> int git_cache_name_pos(struct git_cache *, const char *name, int namelen);
> int git_unpack_trees(struct object_list *trees, struct git_unpack_trees_options *o);
> 
> where "struct git_cache" has "index" thingies, such as
> active_cache, active_nr, active_alloc, and active_cache_tree.
> And we would have pointer to "struct git_cache *" in unpack_trees_options
> structure.

Isn't this an awfully long shot?

I'd be happy if the libification project resulted

- in a (static!) libgit.a which can be linked to qgit or similar (being 
  reentrant, or at least optionally so, and not die()ing all the time), 
  and

- which does not fix the API yet (at least for the most parts).

We _can_ -- once we agree on a stable API -- expose _some_ functions in a 
libgit.so, but that does not have to be the goal for the first step!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  5:30   ` Junio C Hamano
  2007-03-16  6:00     ` Shawn O. Pearce
@ 2007-03-16 12:53     ` Petr Baudis
  2007-03-16 13:47     ` Luiz Fernando N. Capitulino
  2 siblings, 0 replies; 62+ messages in thread
From: Petr Baudis @ 2007-03-16 12:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Luiz Fernando N. Capitulino, git

On Fri, Mar 16, 2007 at 06:30:46AM CET, Junio C Hamano wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > But other areas die when they get given a bad SHA-1 (for example).
> > If the library caller can supply that (possibly bad) SHA-1 to an
> > API function, that's just mean to die out.  ;-)
> 
> That's a real problem, but on the other hand, perl or whatever
> wrapped ones can do the dying (or not dying) before calling into
> libgit, so it may not be such a big issue.

At least you can catch the die from the library caller using
set_*_routine(). ;-)

> >>     o Documentation (eg, doxygen)
> >>     o Unit-tests
> >>     o Add prefix (eg, git_*) to public API functions
> >
> > Yes.  But which functions shall we expose?  ;-)
> 
> Before going into that topic, a bigger question is if we are
> happy with the current internal API and what the goal of
> libification is.  If the libification is going to say that "this
> is a published API so we are not going to change it", I would
> imagine that it would be very hard to accept in the mainline.
> Improvements like the earlier sliding mmap() series need to be
> able to change the interfaces without backward compatibility
> wart.
> 
> In other words, I do not know what idiot ^W ^W who listed the
> libification stuff on the SoC "ideas" page, but I think (1) it
> is premature to promise stable ABI, and (2) if it does not
> promise stable ABI a library is not very useful.

I disagree, it can live in the "zero major version" realm and already be
very useful for language bindings (say whatever is bundled with git
itself) and other nifty stuff.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  4:59 ` Shawn O. Pearce
  2007-03-16  5:30   ` Junio C Hamano
  2007-03-16  8:06   ` Johannes Sixt
@ 2007-03-16 12:55   ` Petr Baudis
  2 siblings, 0 replies; 62+ messages in thread
From: Petr Baudis @ 2007-03-16 12:55 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Luiz Fernando N. Capitulino, git

On Fri, Mar 16, 2007 at 05:59:28AM CET, Shawn O. Pearce wrote:
> "Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> wrote:
> >  3. I don't code in Perl, is it a problem? I mean, the project's
> > goal is to have a Perl binding but I think it goes far from
> > that: we could have a python module, a C program, or anything
> > that shows the libgit is useful.
> 
> No, I don't see that as a problem at all.  We have some Perl
> experts on the mailing list who would like to see Perl bindings.
> Some of the Perl binding is pure C code, and some if it is this
> weird Perl macro language...  so I expect those Perl experts to come
> out of the woodwork and help the community to create a prototype
> set of bindings.  There's also Ruby and Python interests around,
> so we may see bindings for those too.  ;-)

I'll add perl binding as soon as libgit part is there; the
infrastructure is already in place (not now but it's in git history, you
just have to dig it out), so it should be pretty easy too; so even if I
wouldn't, someone surely will. ;-) I don't think knowing Perl or
moreover the Perl XS horrors should be a prerequisite for this project.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 11:54         ` Johannes Schindelin
@ 2007-03-16 13:09           ` Rocco Rutte
  2007-03-16 15:12             ` Johannes Schindelin
  0 siblings, 1 reply; 62+ messages in thread
From: Rocco Rutte @ 2007-03-16 13:09 UTC (permalink / raw)
  To: git

Hi,

* Johannes Schindelin [07-03-16 12:54:52 +0100] wrote:

[...]

>Isn't this an awfully long shot?

>I'd be happy if the libification project resulted

>- in a (static!) libgit.a which can be linked to qgit or similar (being 
>  reentrant, or at least optionally so, and not die()ing all the time), 
>  and

>- which does not fix the API yet (at least for the most parts).

>We _can_ -- once we agree on a stable API -- expose _some_ functions in a 
>libgit.so, but that does not have to be the goal for the first step!

First, I think that would be some cleanup "only" since that basically 
would mean to

   1) make all functions die()ing return some value and handle it and
   2) wrap all static vars into structures and pass them around

If you don't choose a design before wrapping things up in structures, 
you'll probably end up having one structure per source file (at least 
too many structures).

Porting things like qgit to it or writting proper perl/python bindings 
is wasted time since you'd have to rewrite all of it once you decided 
which functions to expose and which structures to use (calling the 
main() routines of builtin's doesn't count as real libifaction, it would 
rather be a performance improvement only).

I'd simply try to find a rough consensus on the data structures and the 
layer model before starting the project, solve 1), afterwards implement 
2) according to it. While 2) happens it would make sense to try to 
develop perl, python, C and C++ bindings in parallel to find out early 
enough whether the design details chosen are useful for real consumers 
outside the git-* tools.

You could put big fat warnings everywhere that parts of the API which 
are exposed are heavily unstable and likely subject to change and that 
programmers using them will have to frequently start over. Once it turns 
out that all the git-tools and all "reference consumers" work it, you 
can do some cleanup to get to the final first API version after the 
libification project is done.

   bye, Rocco
-- 
:wq!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  5:30   ` Junio C Hamano
  2007-03-16  6:00     ` Shawn O. Pearce
  2007-03-16 12:53     ` Petr Baudis
@ 2007-03-16 13:47     ` Luiz Fernando N. Capitulino
  2007-03-16 14:08       ` Petr Baudis
  2007-03-16 15:16       ` Johannes Schindelin
  2 siblings, 2 replies; 62+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-03-16 13:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

Em Thu, 15 Mar 2007 22:30:46 -0700
Junio C Hamano <junkio@cox.net> escreveu:

| "Shawn O. Pearce" <spearce@spearce.org> writes:
| 
| >>     o Documentation (eg, doxygen)
| >>     o Unit-tests
| >>     o Add prefix (eg, git_*) to public API functions
| >
| > Yes.  But which functions shall we expose?  ;-)
| 
| Before going into that topic, a bigger question is if we are
| happy with the current internal API and what the goal of
| libification is.  If the libification is going to say that "this
| is a published API so we are not going to change it", I would
| imagine that it would be very hard to accept in the mainline.

 I think you can put this way: do you want/whish to make
git more useful than it's today?

 If so, such a library is important because it will allow
users to write application that use git in a reasonable
way.

 It doesn't need to be the next five-zilion-function-library
that will provide the wonders of git in several different
ways.

 We could start by fixing the got-an-error-die behaivor and
define a _experimental_ API (just a few functions) just to get
data out of git.

 This would be enough to write the Perl binding I think?

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 13:47     ` Luiz Fernando N. Capitulino
@ 2007-03-16 14:08       ` Petr Baudis
  2007-03-16 18:38         ` Luiz Fernando N. Capitulino
  2007-03-16 15:16       ` Johannes Schindelin
  1 sibling, 1 reply; 62+ messages in thread
From: Petr Baudis @ 2007-03-16 14:08 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino; +Cc: Junio C Hamano, Shawn O. Pearce, git

On Fri, Mar 16, 2007 at 02:47:15PM CET, Luiz Fernando N. Capitulino wrote:
>  We could start by fixing the got-an-error-die behaivor and
> define a _experimental_ API (just a few functions) just to get
> data out of git.
> 
>  This would be enough to write the Perl binding I think?

Actually, well, I've already done this. :-)

The trouble begins when you want to access multiple repositories from
the same process, etc. Without that, writing the Perl binding is
trivial; there's already a hook the binding can use to catch dies, I've
added it.

So, the main point of the work is to define a _good_ API and get rid of
the static state, I guess.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 13:09           ` Rocco Rutte
@ 2007-03-16 15:12             ` Johannes Schindelin
  2007-03-16 15:55               ` Nicolas Pitre
                                 ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-16 15:12 UTC (permalink / raw)
  To: Rocco Rutte; +Cc: git

Hi,

[please do not cull the Cc: list]

On Fri, 16 Mar 2007, Rocco Rutte wrote:

> First, I think that would be some cleanup "only" since that basically would
> mean to
> 
>   1) make all functions die()ing return some value and handle it and
>   2) wrap all static vars into structures and pass them around
> 
> If you don't choose a design before wrapping things up in structures, you'll
> probably end up having one structure per source file (at least too many
> structures).

Why? For some tasks, it should be 1) easier, 2) more elegant, and 3) 
faster to write a function which re-initialises the static variables.

Of course, if you want to work with multiple repos _at the same time_, 
this does not help you. But frankly, we don't support that with core-git, 
so why should we in libgit?

> Porting things like qgit to it or writting proper perl/python bindings 
> is wasted time since you'd have to rewrite all of it once you decided 
> which functions to expose and which structures to use (calling the 
> main() routines of builtin's doesn't count as real libifaction, it would 
> rather be a performance improvement only).

Nope. It is _not_ a complete rewrite. More likely, it is minimal 
adjustments. It's not like we will replace apples with cars...

> I'd simply try to find a rough consensus on the data structures and the 
> layer model before starting the project, solve 1), afterwards implement 
> 2) according to it.

We already _have_ the data structures!

Also, in my experience, defining a complete API, and only after that, 
implement it, never works. Rather, start with a _small_ part you want to 
do. Define a clean API _just for that part_. Implement it. Verify that it 
indeed does what it should do (and that means not just _you_ should verify 
it, but it should be stress tested on the list).

We don't have to create the whole world in one day, you know?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 13:47     ` Luiz Fernando N. Capitulino
  2007-03-16 14:08       ` Petr Baudis
@ 2007-03-16 15:16       ` Johannes Schindelin
  1 sibling, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-16 15:16 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino; +Cc: Junio C Hamano, Shawn O. Pearce, git

Hi,

On Fri, 16 Mar 2007, Luiz Fernando N. Capitulino wrote:

>  It doesn't need to be the next five-zilion-function-library that will 
> provide the wonders of git in several different ways.

Yes. Just like we have a really small really stable part of core-git, 
which can be used by porcelains, and is expected to work the same in 
future versions, we could have eventually with libgit.

That would mean, for example, that rev_info should always be initialised 
with malloc() so that future versions can make it bigger, and that new 
members be added always at the end.

>  We could start by fixing the got-an-error-die behaivor and define a 
> _experimental_ API (just a few functions) just to get data out of git.

That sounds very reasonable.

And if it does not work out as expected, we don't have to make it part of 
"official" Git. It can live on as a fork.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 15:12             ` Johannes Schindelin
@ 2007-03-16 15:55               ` Nicolas Pitre
  2007-03-16 16:13                 ` Johannes Schindelin
  2007-03-16 16:17                 ` Shawn O. Pearce
  2007-03-16 18:20               ` Marco Costalba
  2007-03-18 14:08               ` Petr Baudis
  2 siblings, 2 replies; 62+ messages in thread
From: Nicolas Pitre @ 2007-03-16 15:55 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rocco Rutte, git

On Fri, 16 Mar 2007, Johannes Schindelin wrote:

> We already _have_ the data structures!

Well... Shawn and I are contemplating alternate data structures to 
improve things dramatically.

With a fixed public API I doubt such improvements could be as effective.

One thing that was really done right in the Linux kernel is to _not_ 
have any sort of fixed API at all for drivers.  This is a big upside for 
progress.  Yet the Linux kernel is regarded as highly useful.

So... if any API is to be developed, I'd argue that it must be done 
_above_ the existing code with a higher level of abstraction and a much 
narrower scope.

Nicolas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 15:55               ` Nicolas Pitre
@ 2007-03-16 16:13                 ` Johannes Schindelin
  2007-03-16 16:26                   ` Nicolas Pitre
  2007-03-16 16:17                 ` Shawn O. Pearce
  1 sibling, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-16 16:13 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Rocco Rutte, git

Hi,

On Fri, 16 Mar 2007, Nicolas Pitre wrote:

> On Fri, 16 Mar 2007, Johannes Schindelin wrote:
> 
> > We already _have_ the data structures!
> 
> Well... Shawn and I are contemplating alternate data structures to 
> improve things dramatically.

I was alluding to rev_info, not pack_window and friends.

> With a fixed public API I doubt such improvements could be as effective.

Just think of the "API" we have for porcelains. It is literally unchanged 
since the beginning. You can even use the original script git-log.sh 
today! _That_ is what I mean by fixed public API: give certain guarantees 
about what will not go away.

> One thing that was really done right in the Linux kernel is to _not_ 
> have any sort of fixed API at all for drivers.  This is a big upside for 
> progress.  Yet the Linux kernel is regarded as highly useful.

Yes. I am a Linux user myself.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 15:55               ` Nicolas Pitre
  2007-03-16 16:13                 ` Johannes Schindelin
@ 2007-03-16 16:17                 ` Shawn O. Pearce
  1 sibling, 0 replies; 62+ messages in thread
From: Shawn O. Pearce @ 2007-03-16 16:17 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Johannes Schindelin, Rocco Rutte, git

Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 16 Mar 2007, Johannes Schindelin wrote:
> 
> > We already _have_ the data structures!
> 
> Well... Shawn and I are contemplating alternate data structures to 
> improve things dramatically.

Hang on.  Yes, Nico and I are contemplating alternate disk based
data structure, and in some cases, alternate memory based data
structures to improve things.

But these structures are not changing the basic Git data structures
that have been with us since way back when. ;-) Commits still
have the same fields, with the same data and the same meaning.
Trees still have the same fields, and same meaning... etc.

> With a fixed public API I doubt such improvements could be as effective.

They still can be, and without shooting ourselves in the foot in the
process.

> So... if any API is to be developed, I'd argue that it must be done 
> _above_ the existing code with a higher level of abstraction and a much 
> narrower scope.

Yes.  Today we have a frozen API for commit walking.  Its called
`git rev-list --pretty=raw A ^B`.  That output format is pretty
well set in stone, and we cannot change it.  Everyone knows what
each field means, and hopefully knows that additional fields can
be added.  ;-)

Instead of formatting out those fields as hex strings, or as decimal
integer dates, we can offer them in a struct.  E.g.:

	struct git_objid {
		const unsigned char *obj_name;
	};

	struct git_commit {
		struct git_objid tree;
		struct git_objid *parents;
		uint32_t nr_parent;
		const char *author;
		time_t author_date;
		int author_tz;
		const char *committer;
		time_t committer_date;
		int committer_tz;
		const char *message;
	};

With the rule that the pointers are to static memory buffers that
libgit is loaning out to the caller (the caller should *not* free
these buffers).  This lets us play cute tricks down in the lower
tiers by pointing directly into the packfile dictionary tables
(saves memcpys); or xstrdup/xmalloc everything we give out if we
want to be really paranoid.

Just tossing ideas out - don't think that what I wrote above is my
final suggestion on the matter.  It may change in another day or
two if I think about it more.  ;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 16:13                 ` Johannes Schindelin
@ 2007-03-16 16:26                   ` Nicolas Pitre
  2007-03-16 18:22                     ` Steve Frécinaux
  2007-03-16 23:26                     ` Johannes Schindelin
  0 siblings, 2 replies; 62+ messages in thread
From: Nicolas Pitre @ 2007-03-16 16:26 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rocco Rutte, git

On Fri, 16 Mar 2007, Johannes Schindelin wrote:

> Hi,
> 
> On Fri, 16 Mar 2007, Nicolas Pitre wrote:
> 
> > On Fri, 16 Mar 2007, Johannes Schindelin wrote:
> > 
> > > We already _have_ the data structures!
> > 
> > Well... Shawn and I are contemplating alternate data structures to 
> > improve things dramatically.
> 
> I was alluding to rev_info, not pack_window and friends.
> 
> > With a fixed public API I doubt such improvements could be as effective.
> 
> Just think of the "API" we have for porcelains. It is literally unchanged 
> since the beginning. You can even use the original script git-log.sh 
> today! _That_ is what I mean by fixed public API: give certain guarantees 
> about what will not go away.

Sure.  But the output from an executable is a damn good abstraction and 
the executable itself is an impenetrable boundary.  Anything can change 
(and did change) underneath.

This is why a public API must be done at a higher level to allow for 
anything to change at the lower level as we wish.


Nicolas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 15:12             ` Johannes Schindelin
  2007-03-16 15:55               ` Nicolas Pitre
@ 2007-03-16 18:20               ` Marco Costalba
  2007-03-16 18:38                 ` Marco Costalba
  2007-03-18 14:08               ` Petr Baudis
  2 siblings, 1 reply; 62+ messages in thread
From: Marco Costalba @ 2007-03-16 18:20 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rocco Rutte, git

On 3/16/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> > Porting things like qgit to it or writting proper perl/python bindings
> > is wasted time since you'd have to rewrite all of it once you decided
> > which functions to expose and which structures to use (calling the
> > main() routines of builtin's doesn't count as real libifaction, it would
> > rather be a performance improvement only).
>
> Nope. It is _not_ a complete rewrite. More likely, it is minimal
> adjustments. It's not like we will replace apples with cars...
>

IMHO probably the truth is in the middle. I wouldn't call it a trivial
porting, at least for me, but anyway it would be interesting to have
fun with linking libgit.

*The most important thing for a libgit to be used by qgit is reentrancy*

Currently an unlimited number of tabs could be open in qgit, I'm not
talking about tabs open on different repos, but different views on the
same repo: main view, file history of file A, file history of file B,
tree view, i.e. select some files/directory from directory tree and
view the revisions that modified that repo subset, and so on. Other
different views could be added in the future. Because each view has a
dedicated tab and each tab calls _his_ 'git rev-list' instance (could
be called also at the same time) this libgit thing should be able to
support many instance of the libified git-rev-list function running at
the same time.

Perhaps currently this need is only for qgit among the GUI browsers,
but it would be not too difficult to foreseen a multi view GUI
interface as a relative common feature in the future also for the
remaining crop of git tools.

    Marco

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 16:26                   ` Nicolas Pitre
@ 2007-03-16 18:22                     ` Steve Frécinaux
  2007-03-16 18:53                       ` Nicolas Pitre
  2007-03-16 23:26                     ` Johannes Schindelin
  1 sibling, 1 reply; 62+ messages in thread
From: Steve Frécinaux @ 2007-03-16 18:22 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Johannes Schindelin, Rocco Rutte, git

On Fri, 2007-03-16 at 12:26 -0400, Nicolas Pitre wrote:

> Sure.  But the output from an executable is a damn good abstraction and 
> the executable itself is an impenetrable boundary.  Anything can change 
> (and did change) underneath.

Strictly speaking, you can use opaque structures for commits and so on
(so that the outside world will only ever see a pointer), and use some
getter/setters for commonly used stuffs (like datum, title, content).

Also, I guess what people would expect from a C library is roughly the
same as for the current plumbing... just easier to use from another
program. It doesn't need a low-level access to data structure (most
applications would be to interact with an existing repo or to store data
for a third-party software, something that is high-level) and I don't
think such an opaque API would be a huge constraint as soon as you keep
the Object/Index/Tree/Commit/etc basic opaque structs.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 18:20               ` Marco Costalba
@ 2007-03-16 18:38                 ` Marco Costalba
  2007-03-16 18:59                   ` Nicolas Pitre
  2007-03-16 19:09                   ` Andy Parkins
  0 siblings, 2 replies; 62+ messages in thread
From: Marco Costalba @ 2007-03-16 18:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rocco Rutte, git

On 3/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
>
> *The most important thing for a libgit to be used by qgit is reentrancy*
>

Another crtitical feature is that this call to git-rev-list-like
function MUST be non-blocking.

Reading a big repo could take many seconds, also more then 10 seconds
in cold cache case for Linux tree, as example. Getting the history of
a file ('git rev-list -- /path/to/file) it's also very slow.

There is no way that a GUI tool is allowed to *freeze* for that amount
of time. Currently, because an external process is forked when running
'git rev-list' all the problem is happly handled by the kernel
scheduler and the QProcess callback mechanism (based on select()). In
case of a libified git-rev-list this could be an issue.

   Marco

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 14:08       ` Petr Baudis
@ 2007-03-16 18:38         ` Luiz Fernando N. Capitulino
  2007-03-16 23:16           ` Shawn O. Pearce
  0 siblings, 1 reply; 62+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-03-16 18:38 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Shawn O. Pearce, git

Em Fri, 16 Mar 2007 15:08:55 +0100
Petr Baudis <pasky@suse.cz> escreveu:

| On Fri, Mar 16, 2007 at 02:47:15PM CET, Luiz Fernando N. Capitulino wrote:
| >  We could start by fixing the got-an-error-die behaivor and
| > define a _experimental_ API (just a few functions) just to get
| > data out of git.
| > 
| >  This would be enough to write the Perl binding I think?
| 
| Actually, well, I've already done this. :-)

 Not exactly, at least not the way I think it should be done.

| The trouble begins when you want to access multiple repositories from
| the same process, etc. Without that, writing the Perl binding is
| trivial; there's already a hook the binding can use to catch dies, I've
| added it.
| 
| So, the main point of the work is to define a _good_ API and get rid of
| the static state, I guess.

 Yes, the set_*_routine()s seems a workaround to me, you're only fixing
die()'s final effect.

 I think the right solution is to get rid of die() from functions that
are supposed to be an interface, set errno if needed and return -1
or NULL.

 That looks a lot of work BTW, but I'll be pleased to work on it.

 Is there more things like the set_*_routine()s added to fix
other problems?

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 18:22                     ` Steve Frécinaux
@ 2007-03-16 18:53                       ` Nicolas Pitre
  2007-03-18 13:57                         ` Petr Baudis
  0 siblings, 1 reply; 62+ messages in thread
From: Nicolas Pitre @ 2007-03-16 18:53 UTC (permalink / raw)
  To: Steve Frécinaux; +Cc: Johannes Schindelin, Rocco Rutte, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1484 bytes --]

On Fri, 16 Mar 2007, Steve Frécinaux wrote:

> Also, I guess what people would expect from a C library is roughly the
> same as for the current plumbing... just easier to use from another
> program. It doesn't need a low-level access to data structure (most
> applications would be to interact with an existing repo or to store data
> for a third-party software, something that is high-level) and I don't
> think such an opaque API would be a huge constraint as soon as you keep
> the Object/Index/Tree/Commit/etc basic opaque structs.

Right.  I like that idea.

A good way to define the lib API needs then might be expressed as 
follows:

  Each existing plumbing commands must be turned into the minimal 
  implementation required to interact with the libgit public API and
  display results.

  In other words, the public libgit API should provide the same 
  functionality as existing plumbing commands such that those existing
  commands will only need the necessary code to bridge the C interface
  with the existing command line interface.

Then, of course, there is the matter of reentrancy.  But that's still a 
minor API detail even if it is not a trivial issue implementation wise.  
But the API must be right as this is what we'll be stuck with even if 
the implementation may change.  And as far as an API definition is 
needed I think that it should reflect the current plumbing which is 
actually the real API that grew naturally and has been proven useful.

Nicolas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 18:38                 ` Marco Costalba
@ 2007-03-16 18:59                   ` Nicolas Pitre
  2007-03-16 21:07                     ` Marco Costalba
  2007-03-16 19:09                   ` Andy Parkins
  1 sibling, 1 reply; 62+ messages in thread
From: Nicolas Pitre @ 2007-03-16 18:59 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Johannes Schindelin, Rocco Rutte, git

On Fri, 16 Mar 2007, Marco Costalba wrote:

> On 3/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
> > 
> > *The most important thing for a libgit to be used by qgit is reentrancy*
> > 
> 
> Another crtitical feature is that this call to git-rev-list-like
> function MUST be non-blocking.

I'm not sure I agree.

The non-blockingness can be (and probably should be) handled at a higher 
level with your own threading facility of choice.  Making GIT 
restartable has the potential for making the core code much too complex.


Nicolas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 18:38                 ` Marco Costalba
  2007-03-16 18:59                   ` Nicolas Pitre
@ 2007-03-16 19:09                   ` Andy Parkins
  1 sibling, 0 replies; 62+ messages in thread
From: Andy Parkins @ 2007-03-16 19:09 UTC (permalink / raw)
  To: git; +Cc: Marco Costalba, Johannes Schindelin, Rocco Rutte

On Friday 2007, March 16, Marco Costalba wrote:

> There is no way that a GUI tool is allowed to *freeze* for that
> amount of time. Currently, because an external process is forked when
> running 'git rev-list' all the problem is happly handled by the
> kernel scheduler and the QProcess callback mechanism (based on
> select()). In case of a libified git-rev-list this could be an issue.

I don't think that is ever going to be an issue.  At the worst you could 
just fork() and run the libgit command in that.  Threads are fairly 
easy in Qt as well.

In short, I wouldn't worry about libgit blocking - in fact it's almost a 
guarantee that libgit /will/ block; it would be a nightmare to write an 
asynchronous libgit.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 18:59                   ` Nicolas Pitre
@ 2007-03-16 21:07                     ` Marco Costalba
  2007-03-16 23:24                       ` Johannes Schindelin
  0 siblings, 1 reply; 62+ messages in thread
From: Marco Costalba @ 2007-03-16 21:07 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Johannes Schindelin, Rocco Rutte, git

On 3/16/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 16 Mar 2007, Marco Costalba wrote:
>
> > On 3/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
> > >
> > > *The most important thing for a libgit to be used by qgit is reentrancy*
> > >
> >
> > Another crtitical feature is that this call to git-rev-list-like
> > function MUST be non-blocking.
>
> I'm not sure I agree.
>
> The non-blockingness can be (and probably should be) handled at a higher
> level with your own threading facility of choice.  Making GIT
> restartable has the potential for making the core code much too complex.
>

The fact is that the solution is complex anyway, moving the complex
code at higher level doesn't simplify the whole issue, it just *moves*
the issue somewhere else.

BTW now qgit is single-threaded (as gitk), you suggest that linking
with libgit it will involve to go on the multi threading side and I
think you are right. But it will be not that easy.

Currently we have both single threaded GUI tools and blocking git
commands and it works nicely not because it's simple but because the
'complex code' is hidden inside the OS process handling and scheduling
stuff.

Linking with a synchronous libgit it means, roughly speaking, take the
'complex code' out from the OS and put somewhere in user space, or in
libgit or in the user GUI tool linked with the library.

Now, it happens that Qt has a good multi thread support, but this is
just incidental and of course cannot be taken as granted by a git
library that aims to be broadly and possibly easily used.

Because we are just speaking (well, writing ;-) ) about a possible
library I think we could take in account what would involve to
foreseen a callback mechanism in the API, at least for the slowest
ones.

    Marco

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 18:38         ` Luiz Fernando N. Capitulino
@ 2007-03-16 23:16           ` Shawn O. Pearce
  2007-03-17 19:58             ` Luiz Fernando N. Capitulino
  0 siblings, 1 reply; 62+ messages in thread
From: Shawn O. Pearce @ 2007-03-16 23:16 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino; +Cc: Petr Baudis, Junio C Hamano, git

"Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> wrote:
>  I think the right solution is to get rid of die() from functions that
> are supposed to be an interface, set errno if needed and return -1
> or NULL.

And then make their callers (if they are above the public API layer)
die instead.  In some cases this might imply an undesirable change
in the error message produced, as necessary details that are included
today would be unavailable in the caller.
 
>  Is there more things like the set_*_routine()s added to fix
> other problems?

Not that I am aware of.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 21:07                     ` Marco Costalba
@ 2007-03-16 23:24                       ` Johannes Schindelin
  2007-03-17  7:04                         ` Marco Costalba
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-16 23:24 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Nicolas Pitre, Rocco Rutte, git

Hi,

On Fri, 16 Mar 2007, Marco Costalba wrote:

> On 3/16/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Fri, 16 Mar 2007, Marco Costalba wrote:
> > 
> > > On 3/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
> > > >
> > > > *The most important thing for a libgit to be used by qgit is 
> > > > reentrancy*
> > > >
> > >
> > > Another crtitical feature is that this call to git-rev-list-like
> > > function MUST be non-blocking.
> > 
> > I'm not sure I agree.

I am sure I don't agree.

> > The non-blockingness can be (and probably should be) handled at a 
> > higher level with your own threading facility of choice.  Making GIT 
> > restartable has the potential for making the core code much too 
> > complex.
> 
> The fact is that the solution is complex anyway, moving the complex code 
> at higher level doesn't simplify the whole issue, it just *moves* the 
> issue somewhere else.

It not only *moves* the issue somewhere else, but it also cleanly 
separates the issues.

> BTW now qgit is single-threaded (as gitk), you suggest that linking with 
> libgit it will involve to go on the multi threading side and I think you 
> are right. But it will be not that easy.

Why?

First, it _is_ multi-threaded, since it calls external programs. That is 
even more than a thread. It is a process.

Second, it _would_ be easy to just use the threads provided by Qt.

> Because we are just speaking (well, writing ;-) ) about a possible 
> library I think we could take in account what would involve to foreseen 
> a callback mechanism in the API, at least for the slowest ones.

We are talking about libgit. Which should make access to certain common 
functions on Git repositories easy. Nothing more than that.

If you need to do that asynchronously, do _not_ fiddle with libgit. Just 
imagine what this would involve: you'd have to have timeouts (since there 
is _NO_ other way to find out when to return with empty hands, instead of 
blocking), which is _not_ portable. You'd soon be in the same _mess_ we 
are talking about with respect to exceptions.

Also, you would make _all_ operations expensive, since they _would_ have 
to store state to be restartable.

The common solution for your problem _is_ to use threads.

And you have to admit that _only_ viewers would need asynchronous access 
anyway. I doubt that other tools -- which could take their advantage of a 
libgit -- would need such an access.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 16:26                   ` Nicolas Pitre
  2007-03-16 18:22                     ` Steve Frécinaux
@ 2007-03-16 23:26                     ` Johannes Schindelin
  1 sibling, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-16 23:26 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Rocco Rutte, git

Hi,

On Fri, 16 Mar 2007, Nicolas Pitre wrote:

> [...] the output from an executable is a damn good abstraction and the 
> executable itself is an impenetrable boundary.  Anything can change (and 
> did change) underneath.
> 
> This is why a public API must be done at a higher level to allow for 
> anything to change at the lower level as we wish.

Absolutely.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16  4:24 Libification project (SoC) Luiz Fernando N. Capitulino
  2007-03-16  4:59 ` Shawn O. Pearce
@ 2007-03-17  2:24 ` Jakub Narebski
  2007-03-17  5:22   ` Shawn O. Pearce
  1 sibling, 1 reply; 62+ messages in thread
From: Jakub Narebski @ 2007-03-17  2:24 UTC (permalink / raw)
  To: git

[Cc: git@vger.kernel.org]

Luiz Fernando N. Capitulino wrote:

>     o Documentation (eg, doxygen)

I wonder if documenting and finishing documentation of git storage structure
(format description of: loose objects, packs, pack indices, index, refs and
symbolic refs, packed refs) and git protocols (git protocol description,
local/ssh fetch/push pipeline description), perhaps using RFC or RFC-like
notation could (and should) be made part of libification effort...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-17  2:24 ` Jakub Narebski
@ 2007-03-17  5:22   ` Shawn O. Pearce
  0 siblings, 0 replies; 62+ messages in thread
From: Shawn O. Pearce @ 2007-03-17  5:22 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, lcapitulino

Jakub Narebski <jnareb@gmail.com> wrote:
> [Cc: git@vger.kernel.org]
> 
> Luiz Fernando N. Capitulino wrote:
> 
> >     o Documentation (eg, doxygen)
> 
> I wonder if documenting and finishing documentation of git storage structure
> (format description of: loose objects, packs, pack indices, index, refs and
> symbolic refs, packed refs) and git protocols (git protocol description,
> local/ssh fetch/push pipeline description), perhaps using RFC or RFC-like
> notation could (and should) be made part of libification effort...

I would consider that out of scope for this project.

It would be nice if someone did this work, or at least dusted
off "A Large Angry SCM"'s document and made that available in the
Documentation/technical folder.  But I don't think it should be part
of the Libification SoC project, or any of our other current ideas.

Users of a public API don't need to know the internal formatting
of an object within a packfile.  They do however need to know that
a commit has a tree, and 0-n parents.  And that's already covered
in our existing docs.

And *please* stop breaking the CC chains Jakub.  We've asked you
to not do that.  I had to go lookup Luiz' email address so I could
get him back onto it.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 23:24                       ` Johannes Schindelin
@ 2007-03-17  7:04                         ` Marco Costalba
  2007-03-17 17:29                           ` Johannes Schindelin
  0 siblings, 1 reply; 62+ messages in thread
From: Marco Costalba @ 2007-03-17  7:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Nicolas Pitre, Rocco Rutte, git

On 3/17/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> We are talking about libgit. Which should make access to certain common
> functions on Git repositories easy. Nothing more than that.
>

Fair enough.

> If you need to do that asynchronously, do _not_ fiddle with libgit. Just
> imagine what this would involve: you'd have to have timeouts (since there
> is _NO_ other way to find out when to return with empty hands, instead of
> blocking), which is _not_ portable. You'd soon be in the same _mess_ we
> are talking about with respect to exceptions.
>
> Also, you would make _all_ operations expensive, since they _would_ have
> to store state to be restartable.
>
> The common solution for your problem _is_ to use threads.
>

I would say, the common solution to have non blocking libgit is to use
threads in the tool linked with libgit.

This is clearly a  design choice and I agree it's an important
statement to keep libgit simple and portable (otherwise you'd probably
need to use a thread library as pthread in libgit). Thread facility in
Qt is instead already portable and well integrated. Anyway it's a
design choice perhaps worth documenting.

> And you have to admit that _only_ viewers would need asynchronous access
> anyway. I doubt that other tools -- which could take their advantage of a
> libgit -- would need such an access.
>

Yes, and you have to admit ;-)  that viewers are the tools that mostly
will use libgit.

    Marco

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-17  7:04                         ` Marco Costalba
@ 2007-03-17 17:29                           ` Johannes Schindelin
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-17 17:29 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Nicolas Pitre, Rocco Rutte, git

Hi,

On Sat, 17 Mar 2007, Marco Costalba wrote:

> On 3/17/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> > The common solution for your problem _is_ to use threads.
> 
> I would say, the common solution to have non blocking libgit is to use 
> threads in the tool linked with libgit.

Yes, that's what I tried to say.

> > And you have to admit that _only_ viewers would need asynchronous 
> > access anyway. I doubt that other tools -- which could take their 
> > advantage of a libgit -- would need such an access.
> 
> Yes, and you have to admit ;-)  that viewers are the tools that mostly 
> will use libgit.

I hope that there are many more users. _And_ not all viewers want to do 
the display asynchronously. For example, statplot takes the time it takes.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 23:16           ` Shawn O. Pearce
@ 2007-03-17 19:58             ` Luiz Fernando N. Capitulino
  2007-03-18  5:23               ` Shawn O. Pearce
  0 siblings, 1 reply; 62+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-03-17 19:58 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Petr Baudis, Junio C Hamano, git

On Fri, 16 Mar 2007 19:16:46 -0400
"Shawn O. Pearce" <spearce@spearce.org> wrote:

| "Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> wrote:
| >  I think the right solution is to get rid of die() from functions that
| > are supposed to be an interface, set errno if needed and return -1
| > or NULL.
| 
| And then make their callers (if they are above the public API layer)
| die instead.  In some cases this might imply an undesirable change
| in the error message produced, as necessary details that are included
| today would be unavailable in the caller.

 Exactly!

 One simple example of an important error message that would be
lost can be found in read-cache.c:read_cache_from():

 o index file smaller than expected

 I've found a possible solution, though.

 Take a look at Rusty's solution for the same problem in
module-init-tools:

"""
/* We use error numbers in a loose translation... */
static const char *insert_moderror(int err)
{
	switch (err) {
	case ENOEXEC:
		return "Invalid module format";
	case ENOENT:
		return "Unknown symbol in module, or unknown parameter (see dmesg)";
	case ENOSYS:
		return "Kernel does not have module support";
	default:
		return strerror(err);
	}
}
"""

 Instead of calling strerror() directly for error generated
when inserting a module, the insmod() function calls insert_moderror()
which provides the desirable mapping.

 I think we could have something like that for each git's
module, eg, git_cache_strerror(), git_commit_strerror() and so on.

 Does this look reasonable?

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-17 19:58             ` Luiz Fernando N. Capitulino
@ 2007-03-18  5:23               ` Shawn O. Pearce
  2007-03-18  5:52                 ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Shawn O. Pearce @ 2007-03-18  5:23 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino; +Cc: Petr Baudis, Junio C Hamano, git

"Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br> wrote:
> On Fri, 16 Mar 2007 19:16:46 -0400
> "Shawn O. Pearce" <spearce@spearce.org> wrote:
> | And then make their callers (if they are above the public API layer)
> | die instead.  In some cases this might imply an undesirable change
> | in the error message produced, as necessary details that are included
> | today would be unavailable in the caller.
> 
>  I've found a possible solution, though.
> 
>  Take a look at Rusty's solution for the same problem in
> module-init-tools:
> 
> """
> /* We use error numbers in a loose translation... */
> static const char *insert_moderror(int err)
> {
> 	switch (err) {
> 	case ENOEXEC:
> 		return "Invalid module format";
> 	case ENOENT:
> 		return "Unknown symbol in module, or unknown parameter (see dmesg)";
> 	case ENOSYS:
> 		return "Kernel does not have module support";
> 	default:
> 		return strerror(err);
> 	}
> }
> """

Take a look at sha1_file.c, open_packed_git_1:

...
    if (!pack_version_ok(hdr.hdr_version))
        return error("packfile %s is version %u and not supported"
            " (try upgrading GIT to a newer version)",
            p->pack_name, ntohl(hdr.hdr_version));
...

Here we are supplying a lot more than just a simple error code
that can be mapped to a static string.

Of course that code is currently feeding it to the error function,
which today calls the error_routine (see usage.c).  We could buffer
the strings sent to error()/warn() and let the caller obtain all
strings that occurred during the last API call.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-18  5:23               ` Shawn O. Pearce
@ 2007-03-18  5:52                 ` Junio C Hamano
  2007-03-18 16:18                   ` Luiz Fernando N. Capitulino
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2007-03-18  5:52 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Luiz Fernando N. Capitulino, Petr Baudis, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Take a look at sha1_file.c, open_packed_git_1:
>
> ...
>     if (!pack_version_ok(hdr.hdr_version))
>         return error("packfile %s is version %u and not supported"
>             " (try upgrading GIT to a newer version)",
>             p->pack_name, ntohl(hdr.hdr_version));
> ...
>
> Here we are supplying a lot more than just a simple error code
> that can be mapped to a static string.
>
> Of course that code is currently feeding it to the error function,
> which today calls the error_routine (see usage.c).  We could buffer
> the strings sent to error()/warn() and let the caller obtain all
> strings that occurred during the last API call.

Actually, since we are talking about the error path,

 (1) we do not care performance of what happens there that much, but
 (2) we *do* care about not doing extra allocation.

So it might make sense to have a preallocated "error string"
buffer, sprintf the error message in there and return error
codes.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 18:53                       ` Nicolas Pitre
@ 2007-03-18 13:57                         ` Petr Baudis
  0 siblings, 0 replies; 62+ messages in thread
From: Petr Baudis @ 2007-03-18 13:57 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Steve Frécinaux, Johannes Schindelin, Rocco Rutte, git

On Fri, Mar 16, 2007 at 07:53:06PM CET, Nicolas Pitre wrote:
> A good way to define the lib API needs then might be expressed as 
> follows:
> 
>   Each existing plumbing commands must be turned into the minimal 
>   implementation required to interact with the libgit public API and
>   display results.
> 
>   In other words, the public libgit API should provide the same 
>   functionality as existing plumbing commands such that those existing
>   commands will only need the necessary code to bridge the C interface
>   with the existing command line interface.

I think this is good definition if interpreted well - that is, git-log
library equivalent shouldn't spew out textual output but provide
interface to retrieve revision information in easy-to-use format.

> Then, of course, there is the matter of reentrancy.  But that's still a 
> minor API detail even if it is not a trivial issue implementation wise.  
> But the API must be right as this is what we'll be stuck with even if 
> the implementation may change.  And as far as an API definition is 
> needed I think that it should reflect the current plumbing which is 
> actually the real API that grew naturally and has been proven useful.

Well what you said about reentrancy is that "it's minor API detail but
even minor API details must be right because we will be stuck with
them". And I don't think it's minor at all either. :-)

Also, even if the implementation won't be completely re-entrant
initially, the question of re-entrancy is something we should decide
since it still affects the scope of the librarification work.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-16 15:12             ` Johannes Schindelin
  2007-03-16 15:55               ` Nicolas Pitre
  2007-03-16 18:20               ` Marco Costalba
@ 2007-03-18 14:08               ` Petr Baudis
  2007-03-18 23:48                 ` Johannes Schindelin
  2 siblings, 1 reply; 62+ messages in thread
From: Petr Baudis @ 2007-03-18 14:08 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rocco Rutte, git

On Fri, Mar 16, 2007 at 04:12:17PM CET, Johannes Schindelin wrote:
> Hi,
> 
> [please do not cull the Cc: list]
> 
> On Fri, 16 Mar 2007, Rocco Rutte wrote:
> 
> > First, I think that would be some cleanup "only" since that basically would
> > mean to
> > 
> >   1) make all functions die()ing return some value and handle it and
> >   2) wrap all static vars into structures and pass them around
> > 
> > If you don't choose a design before wrapping things up in structures, you'll
> > probably end up having one structure per source file (at least too many
> > structures).
> 
> Why? For some tasks, it should be 1) easier, 2) more elegant, and 3) 
> faster to write a function which re-initialises the static variables.
> 
> Of course, if you want to work with multiple repos _at the same time_, 
> this does not help you. But frankly, we don't support that with core-git, 
> so why should we in libgit?

Because you don't know who will want to use libgit. Maybe perl bindings
from inside of mod_perl, where single process can multiplex between many
repositories based on whichever request just arrived. You talked about
memory usage issues, but I think that's just a minor technical issue
that can be adjusted, while this is _conceptual_. Maybe someone will
want to write repodiff which looks at two repositories and compares them
(without fetching massive data around). Maybe someone will want to write
some other cool hack we didn't think about.

Because in the other subthread you just suggested the git viewers should
be multi-threaded. Of course you can state that "only a single thread
can use libgit at a time", but then multithreading is just a hack to
work around libgit limitations (albeit still legitimate) while it could
be used to do so much more cool stuff like fetching old history
information on background while you can already _work_ with the tool and
look at the new stuff details (isn't this actually exactly how gitk and
qgit already work? they couldn't with non-reentrant libgit!).

Because if you look at the UNIX history, you'll notice that first people
started with non-reentrant stuff because it was "good enough" and then
came back later and added reentrant versions anyway. Let's learn from
history. It's question of probability but it's very likely this will
happen to us as well.

This is why the _API_ should be designed to be re-entrant. The
implementation may not be re-entrant right away, it may take a while to
get there, but the API really should be.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-18  5:52                 ` Junio C Hamano
@ 2007-03-18 16:18                   ` Luiz Fernando N. Capitulino
  2007-03-18 19:31                     ` Junio C Hamano
  2007-03-18 21:15                     ` Nicolas Pitre
  0 siblings, 2 replies; 62+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-03-18 16:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Petr Baudis, git

On Sat, 17 Mar 2007 22:52:52 -0700
Junio C Hamano <junkio@cox.net> wrote:

| "Shawn O. Pearce" <spearce@spearce.org> writes:
| 
| > Take a look at sha1_file.c, open_packed_git_1:
| >
| > ...
| >     if (!pack_version_ok(hdr.hdr_version))
| >         return error("packfile %s is version %u and not supported"
| >             " (try upgrading GIT to a newer version)",
| >             p->pack_name, ntohl(hdr.hdr_version));
| > ...
| >
| > Here we are supplying a lot more than just a simple error code
| > that can be mapped to a static string.
| >
| > Of course that code is currently feeding it to the error function,
| > which today calls the error_routine (see usage.c).  We could buffer
| > the strings sent to error()/warn() and let the caller obtain all
| > strings that occurred during the last API call.
| 
| Actually, since we are talking about the error path,
| 
|  (1) we do not care performance of what happens there that much, but
|  (2) we *do* care about not doing extra allocation.
| 
| So it might make sense to have a preallocated "error string"
| buffer, sprintf the error message in there and return error
| codes.

 Other possibility is to let the caller do the job.

 I mean, if the information needed to print the error message (packfile
name and version in this example) is available to the caller, or the
caller can get it someway, then the caller could check which error
he got and build the message himself.

 That seems simpler to me, considering the caller has the needed
info, of course...

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-18 16:18                   ` Luiz Fernando N. Capitulino
@ 2007-03-18 19:31                     ` Junio C Hamano
  2007-03-19 16:09                       ` Luiz Fernando N. Capitulino
  2007-03-18 21:15                     ` Nicolas Pitre
  1 sibling, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2007-03-18 19:31 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino; +Cc: Shawn O. Pearce, Petr Baudis, git

"Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br>
writes:

>  I mean, if the information needed to print the error message (packfile
> name and version in this example) is available to the caller, or the
> caller can get it someway, then the caller could check which error
> he got and build the message himself.
>
>  That seems simpler to me, considering the caller has the needed
> info, of course...

It's a possibility, but that would make it much less nice to
diagnose and debug problems, as the caller does not usually have
necessary information.

The caller may ask for object A, and the error is triggered
because a different object C is missing, which is the delta base
of object B which in turn is the delta base of object A.  The
best your "caller" can say is "cannot read object A for some
reason", and it cannot say "cannot read object A because object
C is missing".

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-18 16:18                   ` Luiz Fernando N. Capitulino
  2007-03-18 19:31                     ` Junio C Hamano
@ 2007-03-18 21:15                     ` Nicolas Pitre
  1 sibling, 0 replies; 62+ messages in thread
From: Nicolas Pitre @ 2007-03-18 21:15 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino
  Cc: Junio C Hamano, Shawn O. Pearce, Petr Baudis, git

On Sun, 18 Mar 2007, Luiz Fernando N. Capitulino wrote:

>  Other possibility is to let the caller do the job.
> 
>  I mean, if the information needed to print the error message (packfile
> name and version in this example) is available to the caller, or the
> caller can get it someway, then the caller could check which error
> he got and build the message himself.

Nah...  The error details should be handled at the failure location.  
Any error code based mechanism is bound to get out of synch at some 
point, or people simply won't bother adding new codes for new error 
conditions but simply reuse an existing generic enough code instead.

We already have this nice error() function.  Right now it simply dumps 
the message to stderr but it could be made more sophisticated if needed.

Nicolas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-18 14:08               ` Petr Baudis
@ 2007-03-18 23:48                 ` Johannes Schindelin
  2007-03-19  1:21                   ` Petr Baudis
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-18 23:48 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Rocco Rutte, git

Hi,

On Sun, 18 Mar 2007, Petr Baudis wrote:

> [...] if you look at the UNIX history, you'll notice that first people 
> started with non-reentrant stuff because it was "good enough" and then 
> came back later and added reentrant versions anyway. Let's learn from 
> history. It's question of probability but it's very likely this will 
> happen to us as well.

Yes, let's learn from history. Start with a libgit that is good enough. 
And when somebody actually needs it to behave a little differently, or 
more sophisticated, then let that somebody work on it!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-18 23:48                 ` Johannes Schindelin
@ 2007-03-19  1:21                   ` Petr Baudis
  2007-03-19  1:43                     ` Johannes Schindelin
  0 siblings, 1 reply; 62+ messages in thread
From: Petr Baudis @ 2007-03-19  1:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Rocco Rutte, git

  Hi,

On Mon, Mar 19, 2007 at 12:48:27AM CET, Johannes Schindelin wrote:
> On Sun, 18 Mar 2007, Petr Baudis wrote:
> 
> > [...] if you look at the UNIX history, you'll notice that first people 
> > started with non-reentrant stuff because it was "good enough" and then 
> > came back later and added reentrant versions anyway. Let's learn from 
> > history. It's question of probability but it's very likely this will 
> > happen to us as well.
> 
> Yes, let's learn from history. Start with a libgit that is good enough. 
> And when somebody actually needs it to behave a little differently, or 
> more sophisticated, then let that somebody work on it!

  I was talking about the API. The API has to be designed to be
reentrant. And you get pretty much stuck with the API. And requiring
reentrance isn't that far off once libgit is there, as I tried to point
out; it's not really any obscure requirement.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  1:21                   ` Petr Baudis
@ 2007-03-19  1:43                     ` Johannes Schindelin
  2007-03-19  2:56                       ` Theodore Tso
  2007-03-19  7:01                       ` Marco Costalba
  0 siblings, 2 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-19  1:43 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Rocco Rutte, git

Hi,

On Mon, 19 Mar 2007, Petr Baudis wrote:

> On Mon, Mar 19, 2007 at 12:48:27AM CET, Johannes Schindelin wrote:
> > On Sun, 18 Mar 2007, Petr Baudis wrote:
> > 
> > > [...] if you look at the UNIX history, you'll notice that first 
> > > people started with non-reentrant stuff because it was "good enough" 
> > > and then came back later and added reentrant versions anyway. Let's 
> > > learn from history. It's question of probability but it's very 
> > > likely this will happen to us as well.
> > 
> > Yes, let's learn from history. Start with a libgit that is good 
> > enough. And when somebody actually needs it to behave a little 
> > differently, or more sophisticated, then let that somebody work on it!
> 
>   I was talking about the API. The API has to be designed to be 
> reentrant. And you get pretty much stuck with the API. And requiring 
> reentrance isn't that far off once libgit is there, as I tried to point 
> out; it's not really any obscure requirement.

I don't see _any_ problem in making an API which works with _one_ repo 
first. This has several advantages:

- most users (if any!) will work that way,

- it is easier to implement,

- you are more likely to get that right than the more complex thing you 
  seem to want already in the first version, and

- it is easy enough to extend the API later, _retaining_ the small and 
  beautiful functions.

As for the memory problems I was pointing out to you on IRC: if you do 
some operation on one repo, and run out of memory, okay, there is not much 
you can do about it. Tough luck.

If you cache different repos in the _same_ process, and run out of memory, 
you should free the caches of the _other_ repos first, instead of just 
erroring out. This is not entirely trivial, likely to make libgit fragile, 
and quite possibly a performance hit (making libgit unattractive for 
plumbing, which would take away the best test case for libgit).

Also, when you cache different repos, you want to avoid duplicating 
identical objects in different caches, which makes the cache handling no 
easier.

But even if these issues would not exist, isn't it obvious that you should 
start with something _simple_?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  1:43                     ` Johannes Schindelin
@ 2007-03-19  2:56                       ` Theodore Tso
  2007-03-19  3:55                         ` Shawn O. Pearce
                                           ` (2 more replies)
  2007-03-19  7:01                       ` Marco Costalba
  1 sibling, 3 replies; 62+ messages in thread
From: Theodore Tso @ 2007-03-19  2:56 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Petr Baudis, Rocco Rutte, git

On Mon, Mar 19, 2007 at 02:43:54AM +0100, Johannes Schindelin wrote:
> >   I was talking about the API. The API has to be designed to be 
> > reentrant. And you get pretty much stuck with the API. And requiring 
> > reentrance isn't that far off once libgit is there, as I tried to point 
> > out; it's not really any obscure requirement.
> 
> - it is easy enough to extend the API later, _retaining_ the small and 
>   beautiful functions.

Um, look at what we had to do with gethostbyname() and
gethostbyname_r().  It wasn't possible to sweep through and fix all of
the programs that used gethostbyname(), despite the fact that if a
program called gethostbyname(), then called library function which
unknowingly to application, could possibly do a DNS or YP lookup (and
whose behavior could change depending on some config file like
/etc/nsswitch.conf), which would blow away the static information.  So
if the application tryied to use the information returned by _its_
call to gethostbyname after calling some other library function, it
could get some completely random hostname that wasn't what it
expected.

Yelch!  And so we have two API's that libc has to support,
gethostbyname(), and gethostbyname_r(), with the ugly _r() suffix, and
which in a sane world most programs should use since otherwise they
can be incredibly fragile unless the _first_ thing they do after
calling gethostbyname is to copy the information to someplace stable,
instead of relying on the static buffer to remain sane.  (And yet they
don't, which means bugs that only show up if optional YP or Hesiod
lookups are enabled, etc.)

Berkely got it horribly wrong when it tried to start with the "small
and beautiful" functions that were non-reentrant, and we've been
paying the price ever since.  Do we really want to support two
versions of the API forever?  Is it really that hard to support a
reentrant API from the beginning?  I'd submit the answer to these two
questions are no, and no, respectively.

						- Ted

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  2:56                       ` Theodore Tso
@ 2007-03-19  3:55                         ` Shawn O. Pearce
  2007-03-19 14:57                         ` Johannes Schindelin
  2007-03-19 16:28                         ` Linus Torvalds
  2 siblings, 0 replies; 62+ messages in thread
From: Shawn O. Pearce @ 2007-03-19  3:55 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Johannes Schindelin, Petr Baudis, Rocco Rutte, git

Theodore Tso <tytso@mit.edu> wrote:
> Berkely got it horribly wrong when it tried to start with the "small
> and beautiful" functions that were non-reentrant, and we've been
> paying the price ever since.  Do we really want to support two
> versions of the API forever?  Is it really that hard to support a
> reentrant API from the beginning?  I'd submit the answer to these two
> questions are no, and no, respectively.

I agree entirely, for every reason mentioned by Ted (including
those not quoted).  ;-)

I learned about gethostbyname after gethostbyname_r was already
introduced, so I have always been asking myself "uhhhhh, why do we
have gethostbyname?".  ;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  1:43                     ` Johannes Schindelin
  2007-03-19  2:56                       ` Theodore Tso
@ 2007-03-19  7:01                       ` Marco Costalba
  2007-03-19  9:46                         ` Steve Frécinaux
                                           ` (2 more replies)
  1 sibling, 3 replies; 62+ messages in thread
From: Marco Costalba @ 2007-03-19  7:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Petr Baudis, Rocco Rutte, git, tytso, spearce

On 3/19/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> I don't see _any_ problem in making an API which works with _one_ repo
> first. This has several advantages:
>
> - most users (if any!) will work that way,
>

Sometime could be useful to write a list of possible users before
starting to code.

Please which are, in your opinion, the possible tools that could use a
non-reentrant, blocking libgit? In case tool is already exsistant
please write the name, in case it's a 'would be' one give a brief
description.

I' have tried to do the list myself, but I found only viewers ;-)
among _currently_ tools I know of, and all the viewers allow loading
in background _now_ so will not be portable to libgit without main
surgery, read multi-thread (BTW none is currently multi-thread).

   Marco

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  7:01                       ` Marco Costalba
@ 2007-03-19  9:46                         ` Steve Frécinaux
  2007-03-19 10:33                         ` Steve Frécinaux
  2007-03-19 12:37                         ` Johannes Schindelin
  2 siblings, 0 replies; 62+ messages in thread
From: Steve Frécinaux @ 2007-03-19  9:46 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git

On Mon, 2007-03-19 at 08:01 +0100, Marco Costalba wrote:

> I' have tried to do the list myself, but I found only viewers ;-)
> among _currently_ tools I know of, and all the viewers allow loading
> in background _now_ so will not be portable to libgit without main
> surgery, read multi-thread (BTW none is currently multi-thread).

I thought about configuration tools (gconf, kconfig, etc), that could
then implement something similar to what the recovery system of WinXP
does: they could store an history of the configuration state, and then
recover a previous state if things go wrong. This would be incredibly
useful for system administrators.

Also, more generally, git can be used as a versioned storage system
without direct link to source control. I'm thinking about ikiwiki for
instance.

More SCM-oriented, a cron script that manages a website by checkouting
several repositories (one for the wiki module, another for the blog
module, another for the forum, etc) using, say, the python bindi

There are probably a zillion other possible uses. The common thing when
exposing an API is that it ends up being used in a way nobody had
thought of. So it's dangerous to say "it's useless" or "nobody will do
it". You can be sure someone will, it's just a matter of time.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  7:01                       ` Marco Costalba
  2007-03-19  9:46                         ` Steve Frécinaux
@ 2007-03-19 10:33                         ` Steve Frécinaux
  2007-03-19 12:37                         ` Johannes Schindelin
  2 siblings, 0 replies; 62+ messages in thread
From: Steve Frécinaux @ 2007-03-19 10:33 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git

On Mon, 2007-03-19 at 08:01 +0100, Marco Costalba wrote:

> Please which are, in your opinion, the possible tools that could use a
> non-reentrant, blocking libgit? In case tool is already existent
> please write the name, in case it's a 'would be' one give a brief
> description.

Another idea that I just remembered about: two years ago there was a SoC
project to make nautilus (the file manager from gnome) able to version
directories. It was using SVN (and failed, but it's another story).

While nautilus is heavily multi-threaded, it's a "single-instance app",
so there is at most only one instance of nautilus ever running. Under
the hypothesis of a "versioned directories" support using libgit (that
would be easier to do and support since it doesn't need to set up a
server), it's quite obvious that a non-reentrant git would not be
enough: you are likely to have more than one versioned directories on
screen at the same time! OTOH, blocking doesn't look like an issue since
gnomevfs already deals with quite a number of blocking synchronous libs
and exposes an async API on top of those (similar to what QT Threading
does, I guess).

BTW, if some Gnome people are reading, if libgit comes into life, such a
project is something I'd like to see for real ;-)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  7:01                       ` Marco Costalba
  2007-03-19  9:46                         ` Steve Frécinaux
  2007-03-19 10:33                         ` Steve Frécinaux
@ 2007-03-19 12:37                         ` Johannes Schindelin
  2007-03-19 12:52                           ` Petr Baudis
  2007-03-19 13:04                           ` Marco Costalba
  2 siblings, 2 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-19 12:37 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Petr Baudis, Rocco Rutte, git, tytso, spearce

Hi,

On Mon, 19 Mar 2007, Marco Costalba wrote:

> On 3/19/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 
> > I don't see _any_ problem in making an API which works with _one_ repo
> > first. This has several advantages:
> > 
> > - most users (if any!) will work that way,
> > 
> 
> Sometime could be useful to write a list of possible users before
> starting to code.

Fair enough.

I expect the most visible users of libgit to be: the core Git programs! 
Because if we don't eat our own dog food, why should anybody else?

And I am absolutely utterly opposed to make them slower just to support a 
program which wants to cache meta data from multiple repositories.

Yes, you could write a program which can compare objects from several 
repos, but that is easy in fact: just set GIT_ALTERNATE_OBJECT_DIRECTORIES 
and you're done. Without changing the core of Git at all!

Having said that, I never liked the idea of having static variables to 
talk with config handlers, and would have preferred cb_data like 
for_each_ref() does. That is a low hanging fruit, which does not affect 
performance, and is _definitely_ a clean up.

I am not so sure about the impact of changing the index to a non-static 
structure.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19 12:37                         ` Johannes Schindelin
@ 2007-03-19 12:52                           ` Petr Baudis
  2007-03-19 13:55                             ` Johannes Schindelin
  2007-03-19 13:04                           ` Marco Costalba
  1 sibling, 1 reply; 62+ messages in thread
From: Petr Baudis @ 2007-03-19 12:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Marco Costalba, Rocco Rutte, git, tytso, spearce

On Mon, Mar 19, 2007 at 01:37:18PM CET, Johannes Schindelin wrote:
> Yes, you could write a program which can compare objects from several 
> repos, but that is easy in fact: just set GIT_ALTERNATE_OBJECT_DIRECTORIES 
> and you're done. Without changing the core of Git at all!

But you'll also need to access refs.

And the key point here is reentrance - handling multiple repositories at
once is only part of this, actually probably the much bigger customer
would be multi-threaded programs. And easier creation of reusable
components and other libraries, and so on...

I believe the performance impact will be most likely absolutely
negligible. Of course we have no hard data, but I doubt it's this where
most of the CPU crunching is.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19 12:37                         ` Johannes Schindelin
  2007-03-19 12:52                           ` Petr Baudis
@ 2007-03-19 13:04                           ` Marco Costalba
  1 sibling, 0 replies; 62+ messages in thread
From: Marco Costalba @ 2007-03-19 13:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Petr Baudis, Rocco Rutte, git, tytso, spearce

On 3/19/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Mon, 19 Mar 2007, Marco Costalba wrote:
>
> > On 3/19/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > I don't see _any_ problem in making an API which works with _one_ repo
> > > first. This has several advantages:
> > >
> > > - most users (if any!) will work that way,
> > >
> >
> > Sometime could be useful to write a list of possible users before
> > starting to code.
>
> Fair enough.
>
> I expect the most visible users of libgit to be: the core Git programs!
> Because if we don't eat our own dog food, why should anybody else?
>

But in case you eat your own food, why others should to the same?

> And I am absolutely utterly opposed to make them slower just to support a
> program which wants to cache meta data from multiple repositories.
>

The problem, at least with viewers I know, it's not with multiple
repositories but with multiple  views of the same repo.

Anyway. Just to give my two cent:

The two possible features we are talking about are:

  - reentrancy (many views open on the same repo)

  - non-blocking behaviour (loading repo in background)

These two features are _very_ different. I agree an async library it's
not a small thing, and probably it involves using an external thread
library in libgit itself, like pthread, just to not reinventing the
(difficult) wheel.

Regarding reentrancy I don't know what is involved in avoiding globals
and the like, but I would think it's really an absolute minimum to get
people eating your food ;-)

I completely agree that it's impossible to know how a library will be
used when you write it, but giving a good look around before to start
allows you to get a minimum subset of needed features and if you add a
little bit of generalization and you are lucky enough perhaps you will
avoid to rewrite the library in the future.

>From the viewers survey and also from the interesting examples of
Steve I would say that do not planning for reentarncy would be a big
no-no

  Marco

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19 12:52                           ` Petr Baudis
@ 2007-03-19 13:55                             ` Johannes Schindelin
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-19 13:55 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Marco Costalba, Rocco Rutte, git, tytso, spearce

Hi,

On Mon, 19 Mar 2007, Petr Baudis wrote:

> On Mon, Mar 19, 2007 at 01:37:18PM CET, Johannes Schindelin wrote:
> > Yes, you could write a program which can compare objects from several 
> > repos, but that is easy in fact: just set GIT_ALTERNATE_OBJECT_DIRECTORIES 
> > and you're done. Without changing the core of Git at all!
> 
> But you'll also need to access refs.

Yes, and you want it to bake some fine pizza, too.

> And the key point here is reentrance - handling multiple repositories at 
> once is only part of this, actually probably the much bigger customer 
> would be multi-threaded programs. And easier creation of reusable 
> components and other libraries, and so on...
> 
> I believe the performance impact will be most likely absolutely 
> negligible. Of course we have no hard data, but I doubt it's this where 
> most of the CPU crunching is.

My time is very limited, and I see this thread going nowhere since 
everybody says "I like this, I like that", and nobody shows some hard data 
(me included). It almost feels like a Windows user community. Or Slashdot.

Anyway, I refuse to comment on these issues until somebody proves me wrong 
or right in my assumption that the impact on core Git (in terms of time 
_or_ lines of code) would be huge.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  2:56                       ` Theodore Tso
  2007-03-19  3:55                         ` Shawn O. Pearce
@ 2007-03-19 14:57                         ` Johannes Schindelin
  2007-03-19 16:28                         ` Linus Torvalds
  2 siblings, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2007-03-19 14:57 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Petr Baudis, Rocco Rutte, git

Hi,

On Sun, 18 Mar 2007, Theodore Tso wrote:

> On Mon, Mar 19, 2007 at 02:43:54AM +0100, Johannes Schindelin wrote:
> > >   I was talking about the API. The API has to be designed to be 
> > > reentrant. And you get pretty much stuck with the API. And requiring 
> > > reentrance isn't that far off once libgit is there, as I tried to point 
> > > out; it's not really any obscure requirement.
> > 
> > - it is easy enough to extend the API later, _retaining_ the small and 
> >   beautiful functions.
> 
> Um, look at what we had to do with gethostbyname() and 
> gethostbyname_r().  It wasn't possible to sweep through and fix all of 
> the programs that used gethostbyname(), despite the fact that if a 
> program called gethostbyname(), then called library function which 
> unknowingly to application, could possibly do a DNS or YP lookup (and 
> whose behavior could change depending on some config file like 
> /etc/nsswitch.conf), which would blow away the static information.  So 
> if the application tryied to use the information returned by _its_ call 
> to gethostbyname after calling some other library function, it could get 
> some completely random hostname that wasn't what it expected.
> 
> Yelch!  And so we have two API's that libc has to support, 
> gethostbyname(), and gethostbyname_r(), with the ugly _r() suffix, and 
> which in a sane world most programs should use since otherwise they can 
> be incredibly fragile unless the _first_ thing they do after calling 
> gethostbyname is to copy the information to someplace stable, instead of 
> relying on the static buffer to remain sane.  (And yet they don't, which 
> means bugs that only show up if optional YP or Hesiod lookups are 
> enabled, etc.)
> 
> Berkely got it horribly wrong when it tried to start with the "small and 
> beautiful" functions that were non-reentrant, and we've been paying the 
> price ever since.  Do we really want to support two versions of the API 
> forever?  Is it really that hard to support a reentrant API from the 
> beginning?  I'd submit the answer to these two questions are no, and no, 
> respectively.

You make a good case why gethostbyname() was wrong, and should have been 
defined as gethostbyname_r() to begin with.

However, as I wrote in another reply in this thread, I am not prepared to 
sink more time in this discussion, _unless_ somebody who cares about it 
enough shows me some code and/or numbers.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-18 19:31                     ` Junio C Hamano
@ 2007-03-19 16:09                       ` Luiz Fernando N. Capitulino
  0 siblings, 0 replies; 62+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-03-19 16:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Petr Baudis, git

Em Sun, 18 Mar 2007 12:31:13 -0700
Junio C Hamano <junkio@cox.net> escreveu:

| "Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br>
| writes:
| 
| >  I mean, if the information needed to print the error message (packfile
| > name and version in this example) is available to the caller, or the
| > caller can get it someway, then the caller could check which error
| > he got and build the message himself.
| >
| >  That seems simpler to me, considering the caller has the needed
| > info, of course...
| 
| It's a possibility, but that would make it much less nice to
| diagnose and debug problems, as the caller does not usually have
| necessary information.
| 
| The caller may ask for object A, and the error is triggered
| because a different object C is missing, which is the delta base
| of object B which in turn is the delta base of object A.  The
| best your "caller" can say is "cannot read object A for some
| reason", and it cannot say "cannot read object A because object
| C is missing".

 Okay, you're right. I'm going to let the low-level functions
fill the error buffer then.

 Thanks,

-- 
Luiz Fernando N. Capitulino

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19  2:56                       ` Theodore Tso
  2007-03-19  3:55                         ` Shawn O. Pearce
  2007-03-19 14:57                         ` Johannes Schindelin
@ 2007-03-19 16:28                         ` Linus Torvalds
  2007-03-19 16:32                           ` Linus Torvalds
  2007-03-21 11:17                           ` Andreas Ericsson
  2 siblings, 2 replies; 62+ messages in thread
From: Linus Torvalds @ 2007-03-19 16:28 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Johannes Schindelin, Petr Baudis, Rocco Rutte, git

On Sun, 18 Mar 2007, Theodore Tso wrote:
> 
> Berkely got it horribly wrong when it tried to start with the "small
> and beautiful" functions that were non-reentrant, and we've been
> paying the price ever since.

I don't think that's a good argument, ESPECIALLY when coming from somebody 
from MIT.

Berkeley may have gotten it "horribly wrong", but the fact is, BSD kicked 
ass and took over the world, in a way that nothing comparable I know of 
from MIT ever did. Exactly *because* the BSD people didn't try to make it 
perfect, but made things "small and easy to *implement*".

(I would not say "small and beautiful". "Beauty" had nothing to do with 
it. "simple" had. And unlike beauty, simplicity really *is* more than skin 
deep, and is a fundamentally good design).

I'm a *huge* believer in "Worse is Better" (for people who don't know it, 
just google for that phrase, with the quotes around it).

In fact, I'd argue that the reason git kicks ass is exactly that "Worse is 
Better" design: you need to have a few conceptual (good) ideas to base 
your design off on, but given those good ideas, it's more important that 
things _work_well_in_practice_ than some "wouldn't it be better.." kind of 
mentality.

The "paying the price ever since" argument is bogus. If you get to that 
point, you've by definition *already*won*! 

Here's the real world according to Linus:
 1) everybody makes mistakes
 2) only the winners "pay the price" of those mistakes ever since, since 
    the losers will not be around to pay it, and the winners will have 
    made mistakes too (see #1)
 3) the more complex and subtle you make the interfaces, the more mistakes 
    you'll make, AND the less likely you are to be a winner anyway, since 
    you'll have problems implementing it *and* it will probably be subtle 
    to use too!

So the motto should always be: "Just Do It!", and screw worrying about 
paying the price. You *want* to have to pay the price. It's the best thing 
that can ever happen to you. And you want to have to start paying the 
price as early as possible - because that not only means that you won, it 
also means that you'll now be learning from your mistakes instead of 
trying to anticipate them, and I will *guarantee* that learnign from 
mistakes is going to be a lot more productive than trying to worry about 
them up-front.

> Do we really want to support two versions of the API forever?

I'd personally strongly vote for a "simple library" interface as a first 
cut.

And yes, if that means supporting two versions, I think it's better. You 
can easily have "libgit-simple.a" for trivial non-threaded accesses with 
out-of-memory conditions causing the process to die. That really *is* a 
very useful schenario, as shown by the fact that *every*single*core*git 
program has been happy with it.

Claiming that you need a complicated interface in the face of the *proof* 
that git itself dosn't need that complicated an interface is to me a bit 
disingenious.

Yes, *some* people will want a thread-safe one. But we're not talking 
something like libc here, where the library is so fundamental that it 
needs to be acceptable for everybody. It's perfectly possible to have a 
"libgit-simple.a" that is good for 99% of all uses, and that is simple to 
use, and less bug-prone simply because is is *simpler* (not just for 
users, but as an implementation).

And then for the small small minority of programs that want something 
fancier, do a "libgit-complicated.a" library. IF you ever get it working 
and complete, you can always then implement "libgit-simple" in terms of 
the complicated version.

  Is it 
> really that hard to support a reentrant API from the beginning?  I'd 
> submit the answer to these two questions are no, and no, respectively.
> 
> 						- Ted
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19 16:28                         ` Linus Torvalds
@ 2007-03-19 16:32                           ` Linus Torvalds
  2007-03-21 11:17                           ` Andreas Ericsson
  1 sibling, 0 replies; 62+ messages in thread
From: Linus Torvalds @ 2007-03-19 16:32 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Johannes Schindelin, Petr Baudis, Rocco Rutte, git

Oops. My fingers are faster than my brain, and that email got sent out 
half-completed and without the final editing. But it wasn't reallymissing 
anything else than editing away the parts of the original I didn't respond 
to, and my normal sign-off.

So I'll just sign this one off twice, instead..

		Linus

		Linus

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-19 16:28                         ` Linus Torvalds
  2007-03-19 16:32                           ` Linus Torvalds
@ 2007-03-21 11:17                           ` Andreas Ericsson
  2007-03-21 17:24                             ` Linus Torvalds
  1 sibling, 1 reply; 62+ messages in thread
From: Andreas Ericsson @ 2007-03-21 11:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Johannes Schindelin, Petr Baudis, Rocco Rutte, git

Linus Torvalds wrote:
> 
> I'm a *huge* believer in "Worse is Better" (for people who don't know it, 
> just google for that phrase, with the quotes around it).
> 

I just did, and having read the first page of the document found at 
http://www.jwz.org/doc/worse-is-better.html, I must say "worse-is-better"
sounds an awful lot like evolution; "Start with something that works. When
something else works better, jump train and embrace The New Thing".

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-21 11:17                           ` Andreas Ericsson
@ 2007-03-21 17:24                             ` Linus Torvalds
  2007-03-22  9:51                               ` Andreas Ericsson
  0 siblings, 1 reply; 62+ messages in thread
From: Linus Torvalds @ 2007-03-21 17:24 UTC (permalink / raw)
  To: Andreas Ericsson
  Cc: Theodore Tso, Johannes Schindelin, Petr Baudis, Rocco Rutte, git

On Wed, 21 Mar 2007, Andreas Ericsson wrote:

> Linus Torvalds wrote:
> > 
> > I'm a *huge* believer in "Worse is Better" (for people who don't know it, 
> > just google for that phrase, with the quotes around it).
> 
> I just did, and having read the first page of the document found at 
> http://www.jwz.org/doc/worse-is-better.html, I must say "worse-is-better"
> sounds an awful lot like evolution; "Start with something that works. When
> something else works better, jump train and embrace The New Thing".

Yeah. I'm a huge believer in evolution too (and not just the biological 
kind ;)

The thing is, most "designers" are just totally clueless. Even the 
smartest people that have done something similar five times before are 
prone to totally mis-design something if they start from scratch and try 
to "think it through". You tend to concentrate on the problems of the 
previous generation, and not even think about everything that worked 
wonderfully well, because that wasn't something you *needed* to think 
about.

So "designing" stuff is way overrated. You can spend years designing 
somethign that is total crap, just because you didn't actually try it out 
and _realize_ that it wasn't what the user wanted (it may have been what 
the user _thought_ and _claimed_ that he wanted, but that was before 
actually tried to use it, and realized that he was wrong).

			Linus

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Libification project (SoC)
  2007-03-21 17:24                             ` Linus Torvalds
@ 2007-03-22  9:51                               ` Andreas Ericsson
  0 siblings, 0 replies; 62+ messages in thread
From: Andreas Ericsson @ 2007-03-22  9:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Johannes Schindelin, Petr Baudis, Rocco Rutte, git

Linus Torvalds wrote:
> 
> On Wed, 21 Mar 2007, Andreas Ericsson wrote:
> 
>> Linus Torvalds wrote:
>>> I'm a *huge* believer in "Worse is Better" (for people who don't know it, 
>>> just google for that phrase, with the quotes around it).
>> I just did, and having read the first page of the document found at 
>> http://www.jwz.org/doc/worse-is-better.html, I must say "worse-is-better"
>> sounds an awful lot like evolution; "Start with something that works. When
>> something else works better, jump train and embrace The New Thing".
> 
> Yeah. I'm a huge believer in evolution too (and not just the biological 
> kind ;)
> 
> So "designing" stuff is way overrated. You can spend years designing 
> somethign that is total crap, just because you didn't actually try it out 
> and _realize_ that it wasn't what the user wanted (it may have been what 
> the user _thought_ and _claimed_ that he wanted, but that was before 
> actually tried to use it, and realized that he was wrong).
> 

Indeed. That's probably why Extreme Programming (silly hype-name, but what
to call it otherwise?) has gained so much popularity from the people that
really understand the concept.

To those that don't wish to google for it, Extreme Programming is about
taking small steps that lead to a diffuse goal ("We shall make a fantasy
video game that millions of people would like to play. Significant lore
is here, here and here"). 

The goal and any of the steps might change along the way. Basically, it
puts "re-think, re-design, re-factor" on the table for corporate software
production and promotes rapid implementation over correctness.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2007-03-22  9:51 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-16  4:24 Libification project (SoC) Luiz Fernando N. Capitulino
2007-03-16  4:59 ` Shawn O. Pearce
2007-03-16  5:30   ` Junio C Hamano
2007-03-16  6:00     ` Shawn O. Pearce
2007-03-16  6:54       ` Junio C Hamano
2007-03-16 11:54         ` Johannes Schindelin
2007-03-16 13:09           ` Rocco Rutte
2007-03-16 15:12             ` Johannes Schindelin
2007-03-16 15:55               ` Nicolas Pitre
2007-03-16 16:13                 ` Johannes Schindelin
2007-03-16 16:26                   ` Nicolas Pitre
2007-03-16 18:22                     ` Steve Frécinaux
2007-03-16 18:53                       ` Nicolas Pitre
2007-03-18 13:57                         ` Petr Baudis
2007-03-16 23:26                     ` Johannes Schindelin
2007-03-16 16:17                 ` Shawn O. Pearce
2007-03-16 18:20               ` Marco Costalba
2007-03-16 18:38                 ` Marco Costalba
2007-03-16 18:59                   ` Nicolas Pitre
2007-03-16 21:07                     ` Marco Costalba
2007-03-16 23:24                       ` Johannes Schindelin
2007-03-17  7:04                         ` Marco Costalba
2007-03-17 17:29                           ` Johannes Schindelin
2007-03-16 19:09                   ` Andy Parkins
2007-03-18 14:08               ` Petr Baudis
2007-03-18 23:48                 ` Johannes Schindelin
2007-03-19  1:21                   ` Petr Baudis
2007-03-19  1:43                     ` Johannes Schindelin
2007-03-19  2:56                       ` Theodore Tso
2007-03-19  3:55                         ` Shawn O. Pearce
2007-03-19 14:57                         ` Johannes Schindelin
2007-03-19 16:28                         ` Linus Torvalds
2007-03-19 16:32                           ` Linus Torvalds
2007-03-21 11:17                           ` Andreas Ericsson
2007-03-21 17:24                             ` Linus Torvalds
2007-03-22  9:51                               ` Andreas Ericsson
2007-03-19  7:01                       ` Marco Costalba
2007-03-19  9:46                         ` Steve Frécinaux
2007-03-19 10:33                         ` Steve Frécinaux
2007-03-19 12:37                         ` Johannes Schindelin
2007-03-19 12:52                           ` Petr Baudis
2007-03-19 13:55                             ` Johannes Schindelin
2007-03-19 13:04                           ` Marco Costalba
2007-03-16 12:53     ` Petr Baudis
2007-03-16 13:47     ` Luiz Fernando N. Capitulino
2007-03-16 14:08       ` Petr Baudis
2007-03-16 18:38         ` Luiz Fernando N. Capitulino
2007-03-16 23:16           ` Shawn O. Pearce
2007-03-17 19:58             ` Luiz Fernando N. Capitulino
2007-03-18  5:23               ` Shawn O. Pearce
2007-03-18  5:52                 ` Junio C Hamano
2007-03-18 16:18                   ` Luiz Fernando N. Capitulino
2007-03-18 19:31                     ` Junio C Hamano
2007-03-19 16:09                       ` Luiz Fernando N. Capitulino
2007-03-18 21:15                     ` Nicolas Pitre
2007-03-16 15:16       ` Johannes Schindelin
2007-03-16  8:06   ` Johannes Sixt
2007-03-16  8:58     ` Matthieu Moy
2007-03-16 11:51       ` Johannes Schindelin
2007-03-16 12:55   ` Petr Baudis
2007-03-17  2:24 ` Jakub Narebski
2007-03-17  5:22   ` Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).