git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
@ 2011-03-20 10:55 Pavel Raiskup
  2011-03-20 18:06 ` Shawn Pearce
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Pavel Raiskup @ 2011-03-20 10:55 UTC (permalink / raw)
  To: git

Hi Git's community!

I'd like to ask you for some details about "histogram diff" and "libgit"
enhancement/git-merge tasks for this year's GSOC.

Histogram diff:
There is no mentor mentioned in [1]. Does it mean that there is no person
who can be a mentor for this task or is that assignment possible to be
mentored by everyone mentioned in other tasks? I'd like to do this task  
very
much. After doing a small observing around source code of git/jgit it looks
feasible for me.
There is a goal "Get this feature merged to the upstream git." -- but I  
have
one theoretical question -- what if the benchmarking/study of histogram  
diff
leads to conclusion that this algorithm will not be useful for upstream?
Does it mean "fail" in terms of GSOC? I have to think about it even if it
looks that there should be speedup quite obvious. I don't want to fail
a priory :).

libgit2:
I really like the concept of libraries for to be binding-able from dozens  
of
languages - this leads to expanding functionality among masses users
almost everywhere. In this part I like the idea of implementing new  
features
inside library (diff, config file parsing) but also maybe the task of  
merging
libgit2 into git upstream. Basically I don't know much about that.. and
you wrote that this task is more difficult then others, so I probably need
to study git's and libgit's architecture very precisely beforehand .. but
could you tell me some details about that? Is it impossible to do it before
GSOC deadline and is it worth making a serious big efforts to this task
(from your point of view onto project objectives)? How big are requirements
for this task in term of GSOC?

Now it is quite hectic time because of my study :) it's been a long time
since I've had time for myself but I'd like to prepare some patch for to
proof my interests and abilities.

====
And now not so important part of message (you can skip).. I plan to write
this informations later on to google-melange more precisely.

Something about me || I am:
-- I like C language but there is no problem to study more deeply other
    commonly used languages (I need only little brainstorming),
-- interested in Open Source in general, programming (especially in
    parallel), chess playing and challenges,
-- student of master's degree BUT (CZ), penultimate year of study, my last
    summer :(
-- a fan of Git because of many reasons, I'd like to become a contributor  
even
    if the GSOC opportunity wont come.
-- not so good English speaker so sometimes my messages could be a little
    harder to understand.

Experiences:
In most cases I have only school projects experiences (even if programming
projects are some kind of evergreen here in Brno). But I've had one Open
Source experience -- enhancement for Daniel Stenberg's libcurl [2] followed
with some continuing patches. The main patch implements shell-like wildcard
pattern matching functionality for FTP protocol and makes an enhancement of
API to allow implementing of this functionality among other protocols.
(I've done implementation of wildcard "*.txt, [a-z]???.txt" compiler, auto
testing script, enhancement for testing FTP server inside libcurl, man  
pages,
.. )
The most difficult part was to understand how it works inside curl library
-- but now I think I'm better in that aspect so I think I can make some  
useful
work for Git too.
====

Don't worry please, my next messages will be much briefer :)

Pavel

[1] https://git.wiki.kernel.org/index.php/SoC2011Ideas
[2]  
https://github.com/bagder/curl/commit/0825cd80a62c21725fb3615f1fdd3aa6cc5f0f34

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 10:55 Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC) Pavel Raiskup
@ 2011-03-20 18:06 ` Shawn Pearce
  2011-03-22 12:32   ` Pavel Raiskup
  2011-03-20 18:25 ` Junio C Hamano
  2011-03-20 21:01 ` Vicent Marti
  2 siblings, 1 reply; 13+ messages in thread
From: Shawn Pearce @ 2011-03-20 18:06 UTC (permalink / raw)
  To: Pavel Raiskup; +Cc: git

On Sun, Mar 20, 2011 at 03:55, Pavel Raiskup <xraisk00@gmail.com> wrote:
> I'd like to ask you for some details about "histogram diff" and "libgit"
> enhancement/git-merge tasks for this year's GSOC.
>
> Histogram diff:
> There is no mentor mentioned in [1]. Does it mean that there is no person
> who can be a mentor for this task or is that assignment possible to be
> mentored by everyone mentioned in other tasks? I'd like to do this task very
> much. After doing a small observing around source code of git/jgit it looks
> feasible for me.

As the original author of HistogramDiff in JGit, and a contributor to
C Git... I'm probably the best person to mentor this task. I'm really
busy, so I didn't sign up to mentor anything else this year, but I
think I would make time for this project.

> There is a goal "Get this feature merged to the upstream git." -- but I have
> one theoretical question -- what if the benchmarking/study of histogram diff
> leads to conclusion that this algorithm will not be useful for upstream?

Then the project doesn't merge. :-)

> Does it mean "fail" in terms of GSOC? I have to think about it even if it
> looks that there should be speedup quite obvious. I don't want to fail
> a priory :).

I don't think so

I think the success of this project is if the code is of the quality
that upstream would accept it, and if the final analysis data makes it
clear whether or not its worth including. Its probably not worth
including if its the same speed as the current Myers diff
implementation from libxdiff or slower. But if its 2x faster, its
probably worth merging. If the code quality is acceptable to the
upstream maintainers.

> [1] https://git.wiki.kernel.org/index.php/SoC2011Ideas

-- 
Shawn.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 10:55 Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC) Pavel Raiskup
  2011-03-20 18:06 ` Shawn Pearce
@ 2011-03-20 18:25 ` Junio C Hamano
  2011-03-20 21:01 ` Vicent Marti
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2011-03-20 18:25 UTC (permalink / raw)
  To: Pavel Raiskup; +Cc: git

"Pavel Raiskup" <xraisk00@gmail.com> writes:

> I have one theoretical question -- what if the benchmarking/study of
> histogram diff leads to conclusion that this algorithm will not be
> useful for upstream?  Does it mean "fail" in terms of GSOC?

Not necessarily. A negative result is often as valuable as a positive
result.

It will take a clearly good implementation to justify why a negative
result is a success, though. If it is clear to the reviewers that the
implementation is poorly done, the negative conclusion does not
necessarily mean that use of the histogram algorithm is a bad
approach---it would just mean the particular implementation that didn't
implement it well was, and then the GSoC task may have to be marked as a
failure. But otherwise, if the submission is done with the usual code
quality we would expect from contributors and explained well in its log
message (either positive or negative), I would say it should be considered
a "success".

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 10:55 Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC) Pavel Raiskup
  2011-03-20 18:06 ` Shawn Pearce
  2011-03-20 18:25 ` Junio C Hamano
@ 2011-03-20 21:01 ` Vicent Marti
  2011-03-20 23:44   ` Jeff King
                     ` (2 more replies)
  2 siblings, 3 replies; 13+ messages in thread
From: Vicent Marti @ 2011-03-20 21:01 UTC (permalink / raw)
  To: Pavel Raiskup; +Cc: git

Yo!

On Sun, Mar 20, 2011 at 12:55 PM, Pavel Raiskup <xraisk00@gmail.com> wrote:
> libgit2:
> I really like the concept of libraries for to be binding-able from dozens of
> languages - this leads to expanding functionality among masses users
> almost everywhere. In this part I like the idea of implementing new features
> inside library (diff, config file parsing) but also maybe the task of
> merging
> libgit2 into git upstream. Basically I don't know much about that.. and
> you wrote that this task is more difficult then others, so I probably need
> to study git's and libgit's architecture very precisely beforehand .. but
> could you tell me some details about that? Is it impossible to do it before
> GSOC deadline and is it worth making a serious big efforts to this task
> (from your point of view onto project objectives)? How big are requirements
> for this task in term of GSOC?

Merging libgit2 into upstream Git is a scary as fuck task. Somebody
put it up on the Wiki ideas page, but that was not me -- I'm
personally doubtful of anybody succeeding on doing that project during
the SoC, so I have very little interest on mentoring the task.

Here's what's going on: The Git code base is hairy and not that well
documented, so you're gonna need to study that quite a bit. I like to
think that the libgit2 code base is not hairy, and is pretty well
documented (I'm an optimistic guy), but you're still going to need
quite a bit of research to understand the whole architecture before
you can actually merge anything into Git.

You could try to port just some selected parts of the library to
libgit2 (i.e. the parts which benchmark to be faster than their Git
counterparts), but the interdependency chain of libgit2 internals is
not going to be pretty, embedding into the Git core is not going to be
easy (libgit2 is reentrant and mostly threadsafe, so there's quite the
architecture mismatch there), and there's no guarantee that the final
implementation is going to be faster once it's in there.

Overall, you'd need balls of steel and a lot of spare time and
interest to accomplish anything significant with this task, so my
personal opinion as very old wise man is to forget about it.

HOWEVER. If you want to do something libgit2-related for the SoC
(which would be awesome), there's still two options:

a) Help us make the library more awesome by implementing new features!
This task is the opposite the previous one; it's like full of unicorns
and rainbows. You can choose one (or more) features we are missing,
and see how to implement them in libgit2 while making them reentrant,
threadsafe AND faster. It's not easy, but it's fucking cool. And you
get to do a lot of micro-optimization if you're into that.

b) Write a minimal Git client using libgit2. Peff keeps bringing this
up and I think it's a bangin' good idea. Write something small and
100% self contained in a C executable that runs everywhere with 0
dependencies -- don't aim for full feature completion, just the basic
stuff to interoperate with a Git repository. Clone, checkout, branch,
commit, push, pull, log. I would totally use that shit on my Windows
boxes. And since it'll be externally compatible with the original Git
client, we can reuse the Git unit tests to test libgit2. HA. Awesome!

So, yeah. That's pretty much my libgit2-related advice for the SoC.

Best of luck with your application process with whatever project you decide,
Vicent

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 21:01 ` Vicent Marti
@ 2011-03-20 23:44   ` Jeff King
  2011-03-21  0:38     ` Vicent Marti
  2011-03-22 17:32     ` Pavel Raiskup
  2011-03-21  1:27   ` Jonathan Nieder
  2011-03-23  0:24   ` Vincent van Ravesteijn
  2 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2011-03-20 23:44 UTC (permalink / raw)
  To: Vicent Marti; +Cc: Pavel Raiskup, git

On Sun, Mar 20, 2011 at 11:01:25PM +0200, Vicent Marti wrote:

> b) Write a minimal Git client using libgit2. Peff keeps bringing this
> up and I think it's a bangin' good idea. Write something small and
> 100% self contained in a C executable that runs everywhere with 0
> dependencies -- don't aim for full feature completion, just the basic
> stuff to interoperate with a Git repository. Clone, checkout, branch,
> commit, push, pull, log. I would totally use that shit on my Windows
> boxes. And since it'll be externally compatible with the original Git
> client, we can reuse the Git unit tests to test libgit2. HA. Awesome!

Yeah, I would be happy to mentor or co-mentor with Vicent on a project
like that. Not only might it be useful to actually _use_, but my secret
motive is that I'd like to start testing libgit2 using some of the
regular git tests, both for interoperability and for performance.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 23:44   ` Jeff King
@ 2011-03-21  0:38     ` Vicent Marti
  2011-03-22 17:32     ` Pavel Raiskup
  1 sibling, 0 replies; 13+ messages in thread
From: Vicent Marti @ 2011-03-21  0:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Pavel Raiskup, git

On Mon, Mar 21, 2011 at 1:44 AM, Jeff King <peff@github.com> wrote:
> Yeah, I would be happy to mentor or co-mentor with Vicent on a project
> like that. Not only might it be useful to actually _use_, but my secret
> motive is that I'd like to start testing libgit2 using some of the
> regular git tests, both for interoperability and for performance.

Right on! I've just added this task to the wiki so other prospective
students can take it into account, and listed you as a possible
mentor.

While I'm at it, I removed the "merge libgit2 into mainstream" task
from there. Feel free to re-add it again if you find a suitable mentor
-- I'm getting diarrhea just by thinking about it.

Cheers,
Vicent

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 21:01 ` Vicent Marti
  2011-03-20 23:44   ` Jeff King
@ 2011-03-21  1:27   ` Jonathan Nieder
  2011-03-22 16:43     ` Pavel Raiskup
  2011-03-23  0:24   ` Vincent van Ravesteijn
  2 siblings, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2011-03-21  1:27 UTC (permalink / raw)
  To: Vicent Marti; +Cc: Pavel Raiskup, git, Jeff King, Ramkumar Ramachandra

Hi,

Vicent Marti wrote:

> Merging libgit2 into upstream Git is a scary as fuck task. Somebody
> put it up on the Wiki ideas page, but that was not me

Cc-ing Ram (who added it), in case he has anything to add.

> -- I'm
> personally doubtful of anybody succeeding on doing that project during
> the SoC,

I agree there --- it is a huge task.  But maybe it could inspire
someone to come up with a smaller task.  One long-term goal might be
to get libgit2 and core git to share revision walking APIs; a baby
step towards that would be a proof-of-concept patch to share object
access APIs.

If someone wants to work on this, I'd be glad to talk over what would
be needed to make a realistic proposal.

> so I have very little interest on mentoring the task.

That's okay, of course.  What's probably important for people
considering this project is: would you be willing to answer questions
and consider patches from a person working on this?  That is, do you
consider the goal even worthwhile?

I am probably not the best person to mentor this but if no one else
wants to then I would be interested.

> Here's what's going on: The Git code base is hairy and not that well
> documented, so you're gonna need to study that quite a bit. I like to
> think that the libgit2 code base is not hairy, and is pretty well
> documented (I'm an optimistic guy), but you're still going to need
> quite a bit of research to understand the whole architecture before
> you can actually merge anything into Git.

Like the Linux kernel, the git codebase does not have many comments
alongside the code, it is true.  But it is actually incredibly well
documented in my experience.  The best documentation is in the
history.  In addition to that, there is some API documentation in
Documentation/technical.

A good place to start is the initial commit e83c516 (Initial revision
of "git", the information manager from hell, 2005-04-07).  The
architecture described therein is very simple and still exists today
with few changes.

To explain something that has come later, the easiest way is to learn
how the author explained it when the change was made.

Let me give an example.  Suppose I am wondering how git decides what
commits to show when I say "git log ^topic1 topic2".  In particular, I
wonder what the performance characteristics of that operation are and
how it is able to print the first result without spending O(depth of
history) to traverse all the ancestors of topic1 going back to the
beginning of time.

First step: what does "git log" do with that "^topic1 topic2"?  Wait,
where is the "log" command defined in the first place?

 $ git grep -e '"log"'
[...]
 git.c:          { "log", cmd_log, RUN_SETUP },
[...]

Ok, it's the cmd_log function.  Looking at the definition of that
function, it seems that it does

	init_revisions(&rev, prefix);
	rev.always_show_header = 1;
	memset(&opt, 0, sizeof(opt));
	opt.def = "HEAD";
	cmd_log_init(argc, argv, prefix, &rev, &opt);
	return cmd_log_walk(&rev);

 $ git grep -e init_revisions -- Documentation
 Documentation/technical/api-revision-walking.txt:`init_revisions`::

The revision walking API is explained in the api-revision-walking.txt
document.  From this we learn that responsibility for the revision
walk is divided between prepare_revision_walk and get_revision,
defined in revision.c.

prepare_revision_walk seems to use functions "handle_commit" and
"commit_list_insert_by_date".  What do they do?

 $ git log -p -Shandle_commit -- revision.c
 commit cd2bdc5309461034e5cc58e1d3e87535ed9e093b
 Author: Linus Torvalds <torvalds@osdl.org>
 Date:   Fri Apr 14 16:52:13 2006 -0700

     Common option parsing for "git log --diff" and friends

     This basically does a few things that are sadly somewhat interdependent,
[...]
     Now, that was the easy and straightforward part.

     The slightly more involved part is that some of the programs that want to
     use the new-and-improved rev_info parsing don't actually want _commits_,
     they may want tree'ish arguments instead. That meant that I had to change
     setup_revision() to parse the arguments not into the "revs->commits" list,
     but into the "revs->pending_objects" list.
    
     Then, when we do "prepare_revision_walk()", we walk that list, and create
     the sorted commit list from there.

Okay: so in revision walking:

 - first (in setup_revisions), git pushes the ^topic1 and topic2
   commits onto a list called "pending_objects";
 - next, in prepare_revision_walk, it walks through the pending
   objects list and inserts them in a commit list, sorted by date;

and next?

 $ git log -Sget_revision -- revision.c
[...]
 commit a4a88b2bab3b6fb0b30f63418701f42388e0fe0a
 Author: Linus Torvalds <torvalds@osdl.org>
 Date:   Tue Feb 28 11:24:00 2006 -0800

     git-rev-list libification: rev-list walking

     This actually moves the "meat" of the revision walking from rev-list.c
     to the new library code in revision.h. It introduces the new functions

         void prepare_revision_walk(struct rev_info *revs);
         struct commit *get_revision(struct rev_info *revs);

     to prepare and then walk the revisions that we have.

     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
     Signed-off-by: Junio C Hamano <junkio@cox.net>

Well, that's actually not so helpful.  I mean, it tells us that
get_revision is what takes care of the revision walk, but it doesn't
tell us what the revision walk consists of.

So here we need another trick to get at the meat of the matter ---
we need to know where this "revision walking from rev-list.c" came
from.  Ah:

 $ git log -- rev-list.c
[...]
 commit 64745109c41a5c4a66b9e3df6bca2fd4abf60d48
 Author: Linus Torvalds <torvalds@ppc970.osdl.org>
 Date:   Sat Apr 23 19:04:40 2005 -0700

     Add "rev-list" program that uses the new time-based commit listing.

     This is probably what you'd want to see for "git log".

And the answer is there in the patch for a commit that comes after that
(8906300, git-rev-list: use proper lazy reachability analysis,
2005-05-30).

Heh, probably I didn't choose the best example. :)  A short article
about this in Documentation/technical certainly wouldn't be a bad
thing.

In addition to "git log -S" as used above, I tend to find "git blame -L"
helpful FWIW.  And people on the list can be helpful, too.

> (libgit2 is reentrant and mostly threadsafe, so there's quite the
> architecture mismatch there),

Could you expand on that a little?  I understand that a lot of git
code wouldn't be usable for libgit2 as-is and that there is going to
be some overhead from, say, using malloc to initialize buffers instead
of relying on static ones.  But does that deserve to be called an
architecture mismatch?  Would that make it hard to reuse libgit2 code
within git?

I'd be very interested in learning about more substantial differences
in approach.  Probably the two codebases could learn a lot from each
other's design.

> Overall, you'd need balls of steel

Here I agree.

> HOWEVER. If you want to do something libgit2-related for the SoC
> (which would be awesome), there's still two options:
>
> a) Help us make the library more awesome by implementing new features!
> This task is the opposite the previous one; it's like full of unicorns
> and rainbows. You can choose one (or more) features we are missing,
> and see how to implement them in libgit2 while making them reentrant,
> threadsafe AND faster. It's not easy, but it's fucking cool. And you
> get to do a lot of micro-optimization if you're into that.

Note that if this is your kind of thing, you might consider sending
"libification patches" to modify the code in git while at it.  That
means free code review and free bugfixes from then on if your changes
are accepted.

> b) Write a minimal Git client using libgit2. Peff keeps bringing this
> up and I think it's a bangin' good idea. Write something small and
> 100% self contained in a C executable that runs everywhere with 0
> dependencies -- don't aim for full feature completion, just the basic
> stuff to interoperate with a Git repository.

I agree that this would be very neat, too.

> So, yeah. That's pretty much my libgit2-related advice for the SoC.

Thanks again, Vicent, for these very useful explanations.

> Best of luck with your application process with whatever project you decide,
> Vicent

Seconded. :)

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 18:06 ` Shawn Pearce
@ 2011-03-22 12:32   ` Pavel Raiskup
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Raiskup @ 2011-03-22 12:32 UTC (permalink / raw)
  To: git@vger.kernel.org

>> Histogram diff:
>> There is no mentor mentioned in [1]. Does it mean that there is no person
>> ..
>
> As the original author of HistogramDiff in JGit, and a contributor to
> C Git... I'm probably the best person to mentor this task. I'm really
> busy, so I didn't sign up to mentor anything else this year, but I
> think I would make time for this project.

Thanks for your answer and for your ability to be a mentor of this task.

>> There is a goal "Get this feature merged to the upstream git." -- but I have
>> one theoretical question -- what if the benchmarking/study of histogram diff
>> leads to conclusion that this algorithm will not be useful for upstream?
>
> Then the project doesn't merge. :-)
>
>> Does it mean "fail" in terms of GSOC? I have to think about it even if it
>> looks that there should be speedup quite obvious. I don't want to fail
>> a priory :).
>
> I don't think so
>
> I think the success of this project is if the code is of the quality
> that upstream would accept it, and if the final analysis data makes it
> clear whether or not its worth including. Its probably not worth
> including if its the same speed as the current Myers diff
> implementation from libxdiff or slower. But if its 2x faster, its
> probably worth merging. If the code quality is acceptable to the
> upstream maintainers.

I wanted to know exactly this kind of information. Of course
I don't want to make a code of unacceptable quality from any perspective.

And I think that you probably don't expect histogram diff to be significantly
faster in general :)

Thanks again - it is good to know that you as author of histogram diff are
here. And sorry for my latency ..
[ot] this is because of hectic school schedule now - which is actually not
good :( I need to study git source very deeply _NOW_ (I wanted to reply
earlier but..) [/ot]

Thanks to Junio C Hamano with almost the same answer here:
http://thread.gmane.org/gmane.comp.version-control.git/169498/focus=169516

Pavel

>> [1] https://git.wiki.kernel.org/index.php/SoC2011Ideas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-21  1:27   ` Jonathan Nieder
@ 2011-03-22 16:43     ` Pavel Raiskup
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Raiskup @ 2011-03-22 16:43 UTC (permalink / raw)
  To: Vicent Marti, Jonathan Nieder; +Cc: git, Jeff King, Ramkumar Ramachandra

Hello,

Jonathan Nieder wrote:

> Vicent Marti wrote:
>
>> -- I'm
>> personally doubtful of anybody succeeding on doing that project during
>> the SoC,
>
> ...
> If someone wants to work on this, I'd be glad to talk over what would
> be needed to make a realistic proposal.
>
>> so I have very little interest on mentoring the task.
>
> That's okay, of course.  What's probably important for people
> considering this project is: would you be willing to answer questions
> and consider patches from a person working on this?  That is, do you
> consider the goal even worthwhile?
>
> I am probably not the best person to mentor this but if no one else
> wants to then I would be interested.

As I can see now, it could be quite too heavy for me to produce
results as good as would be needed. This is probably quite difficult
task for starting with git contributing. Rather considering the other
git-topics for now (but I'm not rejecting this idea yet).

> A good place to start is the initial commit e83c516 (Initial revision
> of "git", the information manager from hell, 2005-04-07).
> ........
> Heh, probably I didn't choose the best example. :)  A short article
> about this in Documentation/technical certainly wouldn't be a bad
> thing.
>
> In addition to "git log -S" as used above, I tend to find "git blame -L"
> helpful FWIW.  And people on the list can be helpful, too.

This "short" article is very helpful, thank you for that! I think
it can help all contributors (not only students) at the beginning
of their git journey.

>> b) Write a minimal Git client using libgit2. Peff keeps bringing this
>> up and I think it's a bangin' good idea. Write something small and
>> 100% self contained in a C executable that runs everywhere with 0
>> dependencies -- don't aim for full feature completion, just the basic
>> stuff to interoperate with a Git repository.
>
> I agree that this would be very neat, too.

The idea of git client based on libgit2 sounds VERY interesting. I'm
going to ask for some details in neighboring sub-thread.

>> Best of luck with your application process with whatever project you decide,
>> Vicent
>
> Seconded. :)

Thank you both, I'm not going to try other projects, there is not enough
time now for researching other projects and git is my only choice and desire.

Pavel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 23:44   ` Jeff King
  2011-03-21  0:38     ` Vicent Marti
@ 2011-03-22 17:32     ` Pavel Raiskup
  2011-03-22 18:47       ` Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: Pavel Raiskup @ 2011-03-22 17:32 UTC (permalink / raw)
  To: Vicent Marti, Jeff King; +Cc: git

Hi again!

This sounds probably like the most exciting task for me:

>> b) Write a minimal Git client using libgit2. Peff keeps bringing this
>> up and I think it's a bangin' good idea. Write something small and
>> 100% self contained in a C executable that runs everywhere with 0
>> dependencies -- don't aim for full feature completion, just the basic
>> stuff to interoperate with a Git repository. Clone, checkout, branch,
>> commit, push, pull, log. I would totally use that shit on my Windows
>> boxes. And since it'll be externally compatible with the original Git
>> client, we can reuse the Git unit tests to test libgit2. HA. Awesome!
>
> Yeah, I would be happy to mentor or co-mentor with Vicent on a project
> like that. Not only might it be useful to actually _use_, but my secret
> motive is that I'd like to start testing libgit2 using some of the
> regular git tests, both for interoperability and for performance.

Do you mean git tests in directory "/t"?

Could you give me a list of possible reusable unit tests? After a quick
overview of test suite in git it looks quite complex to reuse. I haven't
spent a lot of time studying test-suite, but calling:

test_expect_success 'plain' 'command && command && ..'

reinterprets chain of commands given in (2nd) string and in this
commands is often called git as utility with arguments. Even in this
very easy test feature is expected some command-line-interface behavior
 from tested utility.. Is this the way how do you want to test this new
libgit2-like tool? So this standalone utility is going to have the
same interface as git has -- kind of substitution of git with "git2"
inside test suite?

This probably will lead to some test suite changes, is it truth?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-22 17:32     ` Pavel Raiskup
@ 2011-03-22 18:47       ` Jeff King
  2011-03-22 19:18         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2011-03-22 18:47 UTC (permalink / raw)
  To: Pavel Raiskup; +Cc: Vicent Marti, git

On Tue, Mar 22, 2011 at 06:32:54PM +0100, Pavel Raiskup wrote:

> >Yeah, I would be happy to mentor or co-mentor with Vicent on a project
> >like that. Not only might it be useful to actually _use_, but my secret
> >motive is that I'd like to start testing libgit2 using some of the
> >regular git tests, both for interoperability and for performance.
> 
> Do you mean git tests in directory "/t"?

Yes.

> Could you give me a list of possible reusable unit tests? After a quick
> overview of test suite in git it looks quite complex to reuse. I haven't
> spent a lot of time studying test-suite, but calling:
> 
> test_expect_success 'plain' 'command && command && ..'
> 
> reinterprets chain of commands given in (2nd) string and in this
> commands is often called git as utility with arguments. Even in this
> very easy test feature is expected some command-line-interface behavior
> from tested utility.. Is this the way how do you want to test this new
> libgit2-like tool? So this standalone utility is going to have the
> same interface as git has -- kind of substitution of git with "git2"
> inside test suite?

Exactly. My plan was to implement a few of the simpler git commands (or
at least the basic parts of them) using libgit2, and then test them with
unmodified scripts from git's t/ directory.

Of course, many of the tests won't pass because of obscure features that
we haven't implemented. But that's OK. Even getting a partial list of
passing tests will be useful. And tests known not to work because of
unimplemented features can often be skipped (see the description of
GIT_SKIP_TESTS in t/README). Part of the project would be sorting out
which tests will be useful.

It may also be necessary to use a mixture of git and libgit2 commands to
finish tests. For example, a test which is really about checking "log"
might use "commit", but "commit" hasn't been implemented yet. But it is
still useful information if we cheat and use regular git's "commit", but
test the libgit2 log command.

As far as which commands to start with, I would start with plumbing
commands like "update-index", "commit-tree", "update-ref", "rev-list",
etc.  Those are basic building blocks that have reasonably simple
interfaces, and they're easy to test. And once you start, I think it
will become more obvious where to go next (because some of the commands
build on the results of others).

> This probably will lead to some test suite changes, is it truth?

There may be modifications necessary to the test suite to make this
easier to do. But rather than forking the test suite and changing the
tests, I would much rather see whatever support is needed done in a
generalized way and merged to regular git.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-22 18:47       ` Jeff King
@ 2011-03-22 19:18         ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2011-03-22 19:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Pavel Raiskup, Vicent Marti, git

Jeff King <peff@github.com> writes:

> It may also be necessary to use a mixture of git and libgit2 commands to
> finish tests. For example, a test which is really about checking "log"
> might use "commit", but "commit" hasn't been implemented yet. But it is
> still useful information if we cheat and use regular git's "commit", but
> test the libgit2 log command.

Absolutely, and I don't even think that is "cheating"; it is merely a
natural way to work incrementally.

> As far as which commands to start with, I would start with plumbing
> commands like "update-index", "commit-tree", "update-ref", "rev-list",
> etc.  Those are basic building blocks that have reasonably simple
> interfaces, and they're easy to test. And once you start, I think it
> will become more obvious where to go next (because some of the commands
> build on the results of others).
>
>> This probably will lead to some test suite changes, is it truth?

Some tests _might_ depend on implementation detail that we would rather
not, but I don't think there are too many of them, unless you count the
stuff that use "test-<something>" helper binary that link with libgit.a to
make direct calls to the internal.  I would suggest to consider a failure
an uncovered bug in the new implementation by default, and discuss the
tests that do depend on the implementation detail of C git on case-by-case
basis to be fixed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC)
  2011-03-20 21:01 ` Vicent Marti
  2011-03-20 23:44   ` Jeff King
  2011-03-21  1:27   ` Jonathan Nieder
@ 2011-03-23  0:24   ` Vincent van Ravesteijn
  2 siblings, 0 replies; 13+ messages in thread
From: Vincent van Ravesteijn @ 2011-03-23  0:24 UTC (permalink / raw)
  To: Vicent Marti; +Cc: git


> b) Write a minimal Git client using libgit2. Peff keeps bringing this
> up and I think it's a bangin' good idea. Write something small and
> 100% self contained in a C executable that runs everywhere with 0
> dependencies -- don't aim for full feature completion, just the basic
> stuff to interoperate with a Git repository. Clone, checkout, branch,
> commit, push, pull, log. I would totally use that shit on my Windows
> boxes. And since it'll be externally compatible with the original Git
> client, we can reuse the Git unit tests to test libgit2. HA. Awesome!
>

I would dream of having a platform-independent GUI based on libgit2 
which could be used to manage a large project. Setup the workflow in the 
app, requiring only single  mouseclicks to promote a topic branch into 
the stable series. Have a button to merge all maint-branch-updates into 
the other branches. And more..

In order to come up with a possible workflow for our project, I have 
been checking out how Git is managed. I got a little bit disappointed 
that Junio uses some 'home-brewn' scripts for Git. I don't want to write 
them myselves (on Windows).

I'm happy to see that the 'vger' people are supporting libgit2.

Anyway, when I do have some time, I am willing to contribute to the 
libgit2 project.

Greetings,

Vincent

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-03-23  0:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-20 10:55 Histogram diff, libgit2 enhancement, libgit2 => git merge (GSOC) Pavel Raiskup
2011-03-20 18:06 ` Shawn Pearce
2011-03-22 12:32   ` Pavel Raiskup
2011-03-20 18:25 ` Junio C Hamano
2011-03-20 21:01 ` Vicent Marti
2011-03-20 23:44   ` Jeff King
2011-03-21  0:38     ` Vicent Marti
2011-03-22 17:32     ` Pavel Raiskup
2011-03-22 18:47       ` Jeff King
2011-03-22 19:18         ` Junio C Hamano
2011-03-21  1:27   ` Jonathan Nieder
2011-03-22 16:43     ` Pavel Raiskup
2011-03-23  0:24   ` Vincent van Ravesteijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).