* GSoC Application [ Parallelism + Git.pm ]
@ 2012-03-18 7:36 Subho Banerjee
2012-03-18 16:31 ` Jakub Narebski
2012-03-21 12:47 ` Thomas Rast
0 siblings, 2 replies; 7+ messages in thread
From: Subho Banerjee @ 2012-03-18 7:36 UTC (permalink / raw)
To: git
Hello,
I am a fourth year undergraduate student of Computer Science and
Engineering at LNMIIT, India. I am extremely comfortable coding in
C/C++ and perl. I am really interested in taking up the Google Summer
of Code program for the "git" project. I have been a git user for a
couple of years now, though I was introduced to github very recently.
Though I have never tried tinkering around with the git source code, I
am a fast learner and I would love to contribute towards making git
better.
I had a look at the "Ideas" page on the GSoC website and I really
liked two particular project ideas, in which I believe I can
contribute to a larger extent.
* Improving parallelism in various commands
* Modernizing and expanding Git.pm
Previously, in the last two summers, I was an intern at the European
Organization for Nuclear Research(CERN). here I worked on mainly two
projects(though none of them used git for version control), whose
experience I believe can be used directly in the tasks I mentioned
above. Firstly, I was working on a HPC monte-carlo simulations and how
to make them faster using pthreads, openmp and cuda. Secondly, I was
working on a grid middleware solution written completely in perl.
>From what I understand of these tasks --
* In the first one, which wants to parallelize certain commands in
git, I believe the major challenges will be to actually find a large
list of commands which can be parallelized.In addition to the commands
mentioned in the Ideas page, only other place I currently think of
exploiting parallelism is in traversing the commit tree when one is
cloning a repository. I would really like it if someone can suggest
more places where this sort of parallelism might be usable, so that I
could use that to make a more complete application. I believe one of
the major difficulties I will initially face, is my unfamiliarity with
the code. This makes finding these commands which might have better
performance with parallelism a little difficult.
* For the second one, which aims at improving the Git perl module. I
tried looking around for this one on the net. I was a little confused
since I could not make out which module this was on CPAN. Is this one
of the Git::* modules or is it all of them. Because the the
functionality of the Git::Config and Git::Commit as mentioned in the
Ideas page seems to be there in the Git::Repository module on CPAN.
Could some one please clarify this.
I would really appreciate any ideas or advice for making my
application for GSoC 2012 better.
Cheers,
Subho.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GSoC Application [ Parallelism + Git.pm ]
2012-03-18 7:36 GSoC Application [ Parallelism + Git.pm ] Subho Banerjee
@ 2012-03-18 16:31 ` Jakub Narebski
2012-03-21 18:52 ` Subho Banerjee
2012-03-21 12:47 ` Thomas Rast
1 sibling, 1 reply; 7+ messages in thread
From: Jakub Narebski @ 2012-03-18 16:31 UTC (permalink / raw)
To: Subho Banerjee; +Cc: git
Subho Banerjee <subs.zero@gmail.com> writes:
[...]
> I had a look at the "Ideas" page on the GSoC website and I really
> liked two particular project ideas, in which I believe I can
> contribute to a larger extent.
>
> * Improving parallelism in various commands
> * Modernizing and expanding Git.pm
[...]
> From what I understand of these tasks --
[...]
> * For the second one, which aims at improving the Git perl module. I
> tried looking around for this one on the net. I was a little confused
> since I could not make out which module this was on CPAN. Is this one
> of the Git::* modules or is it all of them. Because the the
> functionality of the Git::Config and Git::Commit as mentioned in the
> Ideas page seems to be there in the Git::Repository module on CPAN.
> Could some one please clarify this.
The "Modernizing and expanding Git.pm" project refers to the Git
module in git sources[1], that is used by git commands implemented in
Perl like git-svn, git-send-email, and interactive part of git-add.
It is not on CPAN (though if you feel like it putting it on CPAN might
be part of this project, but it must be "dual-lived").
[1]: http://repo.or.cz/w/git.git/blob/HEAD:/perl/Git.pm
http://git.kernel.org/?p=git/git.git;a=blob;hb=HEAD;f=perl/Git.pm
https://github.com/git/git/blob/master/perl/Git.pm
You can of course take inspiration and code (if it is with compatibile
license) from various Git::* modules on CPAN to implement the
"expanding" part of this project.
Note that Git.pm must remain extremly portable, which includes
ActivePerl on MS Windows (msysGit or Cygwin). Use of non-core modules
(for 5.8.0) should be also probably limited.
> I would really appreciate any ideas or advice for making my
> application for GSoC 2012 better.
HTH
--
Jakub Narebski
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GSoC Application [ Parallelism + Git.pm ]
2012-03-18 16:31 ` Jakub Narebski
@ 2012-03-21 18:52 ` Subho Banerjee
2012-03-22 13:33 ` Jakub Narebski
0 siblings, 1 reply; 7+ messages in thread
From: Subho Banerjee @ 2012-03-21 18:52 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Hello Jakub,
I had some time to look through the perl module in the Git sources and
I wanted to summarize the changes that need to be made -
[Primary Task]
[1] Move exception handling from Error::Simple to Try::Tiny and Exception::Class
[Additional]
[2] A Git::Config module that parses the .gitconfig file and the
.git/config file in each repository(Is it one of these files or both?)
[3] Parsing Tree and Commit objects and then traverse the tree
structure in Perl through a Git::Commit module.
[4] Cleaning-up improving the API.
In general move towards an module that can access and change data in
the configuration and commit status using Perl instead of the fork and
IPC being used now.
Is this what you expect as a part of the GSoC work? Could you please
tell me if I am missing something.
Cheers,
Subho.
On Sun, Mar 18, 2012 at 10:01 PM, Jakub Narebski <jnareb@gmail.com> wrote:
>
> Subho Banerjee <subs.zero@gmail.com> writes:
>
> [...]
> > I had a look at the "Ideas" page on the GSoC website and I really
> > liked two particular project ideas, in which I believe I can
> > contribute to a larger extent.
> >
> > * Improving parallelism in various commands
> > * Modernizing and expanding Git.pm
>
> [...]
> > From what I understand of these tasks --
> [...]
> > * For the second one, which aims at improving the Git perl module. I
> > tried looking around for this one on the net. I was a little confused
> > since I could not make out which module this was on CPAN. Is this one
> > of the Git::* modules or is it all of them. Because the the
> > functionality of the Git::Config and Git::Commit as mentioned in the
> > Ideas page seems to be there in the Git::Repository module on CPAN.
> > Could some one please clarify this.
>
> The "Modernizing and expanding Git.pm" project refers to the Git
> module in git sources[1], that is used by git commands implemented in
> Perl like git-svn, git-send-email, and interactive part of git-add.
>
> It is not on CPAN (though if you feel like it putting it on CPAN might
> be part of this project, but it must be "dual-lived").
>
> [1]: http://repo.or.cz/w/git.git/blob/HEAD:/perl/Git.pm
> http://git.kernel.org/?p=git/git.git;a=blob;hb=HEAD;f=perl/Git.pm
> https://github.com/git/git/blob/master/perl/Git.pm
>
> You can of course take inspiration and code (if it is with compatibile
> license) from various Git::* modules on CPAN to implement the
> "expanding" part of this project.
>
> Note that Git.pm must remain extremly portable, which includes
> ActivePerl on MS Windows (msysGit or Cygwin). Use of non-core modules
> (for 5.8.0) should be also probably limited.
>
> > I would really appreciate any ideas or advice for making my
> > application for GSoC 2012 better.
>
> HTH
> --
> Jakub Narebski
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GSoC Application [ Parallelism + Git.pm ]
2012-03-21 18:52 ` Subho Banerjee
@ 2012-03-22 13:33 ` Jakub Narebski
0 siblings, 0 replies; 7+ messages in thread
From: Jakub Narebski @ 2012-03-22 13:33 UTC (permalink / raw)
To: Subho Banerjee; +Cc: git
On Wed, 21 Mar 2012, Subho Banerjee wrote:
> I had some time to look through the perl module in the Git sources and
> I wanted to summarize the changes that need to be made -
>
> [Primary Task]
> [1] Move exception handling from Error::Simple to Try::Tiny and Exception::Class
Well, that is not as much "primary task" as "minimal scope"...
We have also decide if we want to keep compatibility layer, including
git_cmd_try. Git commands in Perl can be rewriten to using Try::Tiny
directly, but I wonder if there are out of tree Perl scripts and modules
that use Git module, and would be broken by not preserving backward
compatibility.
Another issue of note is whether we want to have all non-core Git.pm
prerequisites included like private-Error.pm (though I think in 'inc/'
or something).
> [Additional]
> [2] A Git::Config module that parses the .gitconfig file and the
> .git/config file in each repository (Is it one of these files or both?)
I don't think parsing git config file format in Perl is a good idea.
What I had in mind was to use `git config -l -z` output, reading all
configuration at once with a single git command.
Note that this would require converting to specific types, like e.g.
turning true values of git config (e.g. string "true") into Perl
true (1). The --int, --bool, --bool-or-int and --path should be easy;
the problem could be with --get-colorbool and --get-color.
Optionally (if possible) make it so Git / Git::Repo object uses either
one "git config --get <key>" for each $git->config(<key>) request, or
cached values from single "git config -l -z" via Git::Config, depending
on constructor options ('lazy_config' => 1). Then we could fall back
to one command for one access for unknown types...
> [3] Parsing Tree and Commit objects and then traverse the tree
> structure in Perl through a Git::Commit module.
'commit', 'tag' and 'tree', and parsing and formatting ident fields.
Note that there is a difference between 'raw' output format ("git cat-file"),
and e.g. "git ls-tree" for 'tree' objects.
If by 'traverse the tree structure in Perl through a Git::Commit module.'
you mean traversing DAG of revisions, then I think it is out of scope.
And there is also matter of parsing diff output (raw/tree, numstat,
patchset).
> [4] Cleaning-up improving the API.
> In general move towards an module that can access and change data in
> the configuration and commit status using Perl instead of the fork and
> IPC being used now.
Yes, so that most if not all operations can be done on the level of Git.pm
methods or subroutines, and not having to invoke git commands and parse
their output by hand.
This probably needs to be done by examining what git commands in Perl
need.
> Is this what you expect as a part of the GSoC work? Could you please
> tell me if I am missing something.
Another task could be CPAN-ification of Git.pm. This _could_ include
creating a separate repository for Git.pm (with all thats entailed),
and subtree-mergeing it into git.git like git-gui and gitk are.
Yet another task could be making all git commands in Perl use Git.pm
(well, at least those actively maintained).
Perhaps also cleaning those Perl::Critic warnings that make sense.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GSoC Application [ Parallelism + Git.pm ]
2012-03-18 7:36 GSoC Application [ Parallelism + Git.pm ] Subho Banerjee
2012-03-18 16:31 ` Jakub Narebski
@ 2012-03-21 12:47 ` Thomas Rast
2012-03-21 13:53 ` Subho Banerjee
1 sibling, 1 reply; 7+ messages in thread
From: Thomas Rast @ 2012-03-21 12:47 UTC (permalink / raw)
To: Subho Banerjee; +Cc: git
Subho Banerjee <subs.zero@gmail.com> writes:
> * In the first one, which wants to parallelize certain commands in
> git, I believe the major challenges will be to actually find a large
> list of commands which can be parallelized.In addition to the commands
> mentioned in the Ideas page, only other place I currently think of
> exploiting parallelism is in traversing the commit tree when one is
> cloning a repository. I would really like it if someone can suggest
> more places where this sort of parallelism might be usable, so that I
> could use that to make a more complete application. I believe one of
> the major difficulties I will initially face, is my unfamiliarity with
> the code. This makes finding these commands which might have better
> performance with parallelism a little difficult.
Please read my reply to Felipe in the other thread:
http://thread.gmane.org/gmane.comp.version-control.git/193352/focus=193574
as I'd have to repeat myself.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GSoC Application [ Parallelism + Git.pm ]
2012-03-21 12:47 ` Thomas Rast
@ 2012-03-21 13:53 ` Subho Banerjee
2012-03-21 14:10 ` Nguyen Thai Ngoc Duy
0 siblings, 1 reply; 7+ messages in thread
From: Subho Banerjee @ 2012-03-21 13:53 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
Hello,
Thanks for the links. Those really helped.
This threading that will have to be done, will this have to platform
independent, for example will it be used in windows without cygwin?
Cheers,
Subho.
On Wed, Mar 21, 2012 at 6:17 PM, Thomas Rast <trast@student.ethz.ch> wrote:
>
> Subho Banerjee <subs.zero@gmail.com> writes:
>
> > * In the first one, which wants to parallelize certain commands in
> > git, I believe the major challenges will be to actually find a large
> > list of commands which can be parallelized.In addition to the commands
> > mentioned in the Ideas page, only other place I currently think of
> > exploiting parallelism is in traversing the commit tree when one is
> > cloning a repository. I would really like it if someone can suggest
> > more places where this sort of parallelism might be usable, so that I
> > could use that to make a more complete application. I believe one of
> > the major difficulties I will initially face, is my unfamiliarity with
> > the code. This makes finding these commands which might have better
> > performance with parallelism a little difficult.
>
> Please read my reply to Felipe in the other thread:
>
> http://thread.gmane.org/gmane.comp.version-control.git/193352/focus=193574
>
> as I'd have to repeat myself.
>
> --
> Thomas Rast
> trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GSoC Application [ Parallelism + Git.pm ]
2012-03-21 13:53 ` Subho Banerjee
@ 2012-03-21 14:10 ` Nguyen Thai Ngoc Duy
0 siblings, 0 replies; 7+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-21 14:10 UTC (permalink / raw)
To: Subho Banerjee; +Cc: Thomas Rast, git
On Wed, Mar 21, 2012 at 8:53 PM, Subho Banerjee <subs.zero@gmail.com> wrote:
> Hello,
> Thanks for the links. Those really helped.
> This threading that will have to be done, will this have to platform
> independent, for example will it be used in windows without cygwin?
Yes. We simulate a subset of pthreads API for Windows in
compat/win32/pthread.[ch]. Stick to those functions and you are fine
(or add some more, of course).
--
Duy
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-03-22 13:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-18 7:36 GSoC Application [ Parallelism + Git.pm ] Subho Banerjee
2012-03-18 16:31 ` Jakub Narebski
2012-03-21 18:52 ` Subho Banerjee
2012-03-22 13:33 ` Jakub Narebski
2012-03-21 12:47 ` Thomas Rast
2012-03-21 13:53 ` Subho Banerjee
2012-03-21 14:10 ` Nguyen Thai Ngoc Duy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).