* [GSoC] Applying for conversion scripts to builtins
@ 2015-03-16 16:49 Yurii Shevtsov
2015-03-16 18:03 ` Matthieu Moy
2015-03-17 0:22 ` Paul Tan
0 siblings, 2 replies; 6+ messages in thread
From: Yurii Shevtsov @ 2015-03-16 16:49 UTC (permalink / raw)
To: git
I'm going to write for this idea. As I know good proposal should
contain timeline and Todo estimations. What should I write in my
proposal, since there is no clear plan for converting scripts to
builtins. Thanks in advance!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC] Applying for conversion scripts to builtins
2015-03-16 16:49 [GSoC] Applying for conversion scripts to builtins Yurii Shevtsov
@ 2015-03-16 18:03 ` Matthieu Moy
2015-03-17 0:22 ` Paul Tan
1 sibling, 0 replies; 6+ messages in thread
From: Matthieu Moy @ 2015-03-16 18:03 UTC (permalink / raw)
To: Yurii Shevtsov; +Cc: git
Yurii Shevtsov <ungetch@gmail.com> writes:
> I'm going to write for this idea. As I know good proposal should
> contain timeline and Todo estimations. What should I write in my
> proposal, since there is no clear plan for converting scripts to
> builtins. Thanks in advance!
The fact that there is no clear plan is part of the plan. The idea is
that how much can be converted depends highly on how the GSoC goes. You
already saw with your microproject that something apparently easy can be
harder than it seems.
See this thread for a discussion on the topic:
http://thread.gmane.org/gmane.comp.version-control.git/264050/focus=21366
Now, it's up to you to make a good proposal, i.e. both convince people
that you can do a good job, and OTOH be realistic about what can be done
in a GSoC.
--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC] Applying for conversion scripts to builtins
2015-03-16 16:49 [GSoC] Applying for conversion scripts to builtins Yurii Shevtsov
2015-03-16 18:03 ` Matthieu Moy
@ 2015-03-17 0:22 ` Paul Tan
2015-03-17 1:34 ` Duy Nguyen
2015-03-17 11:56 ` Johannes Schindelin
1 sibling, 2 replies; 6+ messages in thread
From: Paul Tan @ 2015-03-17 0:22 UTC (permalink / raw)
To: Yurii Shevtsov; +Cc: Git List, Matthieu Moy, johannes.schindelin
Hi,
On Tue, Mar 17, 2015 at 12:49 AM, Yurii Shevtsov <ungetch@gmail.com> wrote:
> I'm going to write for this idea. As I know good proposal should
> contain timeline and Todo estimations. What should I write in my
> proposal, since there is no clear plan for converting scripts to
> builtins. Thanks in advance!
I'm actually writing a proposal for the same topic because I somehow
ended up with a working prototype of git-pull.c while exploring the
internal git API ;). It's not ready as a patch yet though as there are
some problems with git's internal API which causes e.g. double free
errors and too much code complexity due to required functionality not
being exposed by builtins, which will have to be addressed.
Generally, it would be easy to convert any shell script to C by just
using the run_command* functions (and in less lines of code), but that
would not be taking advantage of the potential benefits in porting
shell scripts to C. To summarize the (ideal) requirements:
* zero spawning of processes so that the internal object/config/index
cache can be taken advantage of. (and to avoid the process spawning
overhead which is relative large in e.g. Windows)
* avoid needless parsing since we have direct access to the C data
structures.
* use the internal API as much as possible: share code between the
builtins (e.g. fmt-merge-msg.c, exposed in fmt-merge-msg.h) in order
to reduce code complexity.
The biggest wins would definitely be portability, but there may be
performance improvements, though they are theoretical at this point.
I'm not exactly sure if the above requirements are sane, which is why
I'm also CC-ing Dscho who knows the problems of git on Windows more
than I do.
Regards,
Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC] Applying for conversion scripts to builtins
2015-03-17 0:22 ` Paul Tan
@ 2015-03-17 1:34 ` Duy Nguyen
2015-03-17 11:56 ` Johannes Schindelin
1 sibling, 0 replies; 6+ messages in thread
From: Duy Nguyen @ 2015-03-17 1:34 UTC (permalink / raw)
To: Paul Tan; +Cc: Yurii Shevtsov, Git List, Matthieu Moy, Johannes Schindelin
On Tue, Mar 17, 2015 at 7:22 AM, Paul Tan <pyokagan@gmail.com> wrote:
> Hi,
>
> On Tue, Mar 17, 2015 at 12:49 AM, Yurii Shevtsov <ungetch@gmail.com> wrote:
>> I'm going to write for this idea. As I know good proposal should
>> contain timeline and Todo estimations. What should I write in my
>> proposal, since there is no clear plan for converting scripts to
>> builtins. Thanks in advance!
>
> I'm actually writing a proposal for the same topic because I somehow
> ended up with a working prototype of git-pull.c while exploring the
> internal git API ;). It's not ready as a patch yet though as there are
> some problems with git's internal API which causes e.g. double free
> errors and too much code complexity due to required functionality not
> being exposed by builtins, which will have to be addressed.
>
> Generally, it would be easy to convert any shell script to C by just
> using the run_command* functions (and in less lines of code), but that
> would not be taking advantage of the potential benefits in porting
> shell scripts to C. To summarize the (ideal) requirements:
While run_command() is not ideal, it would be a good intermediate
state where you can verify with the test suite that the C skeleton
after rewrite is working ok. Then you can start killing run_command()
in subsequent patches. That would be much easier to review code too.
--
Duy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC] Applying for conversion scripts to builtins
2015-03-17 0:22 ` Paul Tan
2015-03-17 1:34 ` Duy Nguyen
@ 2015-03-17 11:56 ` Johannes Schindelin
2015-03-17 18:38 ` Junio C Hamano
1 sibling, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2015-03-17 11:56 UTC (permalink / raw)
To: Paul Tan; +Cc: Yurii Shevtsov, Git List, Matthieu Moy
Hi Paul,
On 2015-03-17 01:22, Paul Tan wrote:
> On Tue, Mar 17, 2015 at 12:49 AM, Yurii Shevtsov <ungetch@gmail.com> wrote:
>
> Generally, it would be easy to convert any shell script to C by just
> using the run_command* functions (and in less lines of code), but that
> would not be taking advantage of the potential benefits in porting
> shell scripts to C. To summarize the (ideal) requirements:
>
> * zero spawning of processes so that the internal object/config/index
> cache can be taken advantage of. (and to avoid the process spawning
> overhead which is relative large in e.g. Windows)
Spawning definitely uses up many more resources on Windows.
However, spawning a full-fledged Bash requires MSys (or soon MSys2) to spin up an entire POSIX emulation layer. This costs us dearly. For example, when I run the t3404 test (which exercises scripting heavily, what with `git rebase -i` being implemented as a shell script) on MacOSX, it takes roughly a minute to complete. On a comparable Windows machine, it takes roughly 12 minutes to complete.
Therefore, I would wager a bet that just the mere conversion of a shell script into even a primitive `run_command()`-based builtin would help performance on Windows in a noticeable manner.
Of course, it would be *even nicer* to avoid the spawning altogether.
> * avoid needless parsing since we have direct access to the C data
> structures.
True that. Turning SHA-1s into strings, spawning, and reparsing the same SHA-1 is quite a lot of unnecessary churn.
The biggest benefit of avoiding needless parsing, however, is not performance. It is avoiding quoting issues. This is particularly so on Windows, where Git is sometimes called from outside a shell environment, where we have to deal with inconsistent quoting because it is every Windows program's own job to parse the command-line, including the quoting.
> * use the internal API as much as possible: share code between the
> builtins (e.g. fmt-merge-msg.c, exposed in fmt-merge-msg.h) in order
> to reduce code complexity.
That is definitely something that even the Git maintainer should be interested in (he does not touch Windows, therefore the performance differences do not concern him): by sharing code paths between different subcommands, you ensure that you have to fix problems only once, not twice or more.
Concrete example: on Windows, we have file locking issues because files that are in use cannot be deleted. For that reason, we have Windows-specific code that is "nice" by trying harder to delete files, giving programs a little time to let their locks go. This locking issue happens also when a virus scanner "uses", say, the .git-rewrite/revs file that was written by `git filter-branch`, while said shell script already wants to delete the file because it is obsolete. If `git filter-branch` were a builtin, the bug would already be fixed due to our override of the `unlink()` function in C. Now we have to fix that bug separately because `filter-branch` is a shell script.
> The biggest wins would definitely be portability, but there may be
> performance improvements, though they are theoretical at this point.
>
> I'm not exactly sure if the above requirements are sane, which is why
> I'm also CC-ing Dscho who knows the problems of git on Windows more
> than I do.
Thanks for bringing this to my attention. I hope I managed to add useful information to the discussion.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC] Applying for conversion scripts to builtins
2015-03-17 11:56 ` Johannes Schindelin
@ 2015-03-17 18:38 ` Junio C Hamano
0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2015-03-17 18:38 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Paul Tan, Yurii Shevtsov, Git List, Matthieu Moy
Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> Therefore, I would wager a bet that just the mere conversion of a
> shell script into even a primitive `run_command()`-based builtin would
> help performance on Windows in a noticeable manner.
As you correctly allege, if a patch rewrote a shell-scripted
porcelain by using series of run_command() and doing nothing else, I
would have asked "is that an improvement?", without knowing that.
> Of course, it would be *even nicer* to avoid the spawning altogether.
Yeah, that, too ;-)
> The biggest benefit of avoiding needless parsing, however, is not
> performance. It is avoiding quoting issues. This is particularly so on
> Windows, where Git is sometimes called from outside a shell
> environment, where we have to deal with inconsistent quoting because
> it is every Windows program's own job to parse the command-line,
> including the quoting.
>
> Concrete example: on Windows, we have file locking issues because
> files that are in use cannot be deleted. For that reason, we have
> Windows-specific code that is "nice" by trying harder to delete files,
> giving programs a little time to let their locks go. This locking
> issue happens also when a virus scanner "uses"...
These are definitely good advices from the area expert.
Thanks for a bunch of good input.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-03-17 18:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-16 16:49 [GSoC] Applying for conversion scripts to builtins Yurii Shevtsov
2015-03-16 18:03 ` Matthieu Moy
2015-03-17 0:22 ` Paul Tan
2015-03-17 1:34 ` Duy Nguyen
2015-03-17 11:56 ` Johannes Schindelin
2015-03-17 18:38 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.