Git development

Git development
 help / color / mirror / Atom feed

* [RFC] Second attempt at making git-clean a builtin
From: Shawn Bohrer @ 2007-11-04 19:02 UTC (permalink / raw)
  To: git; +Cc: gitster

I've taken all of the comments I received from my previous attempt see:

http://marc.info/?l=git&m=119181975419521&w=2

With these new changes in place my new git-clean passes all of the
original tests as well as the new tests I've added.  While looking at
how git-ls-files walks the tree there were some things that didn't quite
understand, or thought might be unnecessary so there may be some things I
missed.  For example I'm still not quite sure what verify_pathspec()
does.

I did however notice what I would call a bug in the behavior of
git-ls-files and therefore the current git-clean.sh.  With the current
git-clean if you have two directories that contain only untracked files,
for example docs/ and examples/ running:

git clean docs/ examples/

will not remove either directory.  Instead you must use the -d
parameter.  To me this makes sense, however if you run:

git clean docs/

it will remove the docs directory without using the -d parameter.  My
patch is at least consistent in that it requires the -d in both cases.

^ permalink raw reply

* [PATCH] Add more tests for git-clean
From: Shawn Bohrer @ 2007-11-04 19:02 UTC (permalink / raw)
  To: git; +Cc: gitster, Shawn Bohrer
In-Reply-To: <1194202941253-git-send-email-shawn.bohrer@gmail.com>

Signed-off-by: Shawn Bohrer <shawn.bohrer@gmail.com>
---
 t/t7300-clean.sh |  109 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 8697213..d74c11c 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -39,6 +39,97 @@ test_expect_success 'git-clean' '
 
 '
 
+test_expect_success 'git-clean src/' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	git-clean src/ &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
+test_expect_success 'git-clean src/ src/' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	git-clean src/ src/ &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
+test_expect_success 'git-clean with prefix' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	cd src/ &&
+	git-clean &&
+	cd - &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+test_expect_success 'git-clean -d with prefix and path' '
+
+	mkdir -p build docs src/feature &&
+	touch a.out src/part3.c src/feature/file.c docs/manual.txt obj.o build/lib.so &&
+	cd src/ &&
+	git-clean -d feature/ &&
+	cd - &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test -f src/part3.c &&
+	test ! -f src/feature/file.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
+test_expect_success 'git-clean symbolic link' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	ln -s docs/manual.txt src/part4.c
+	git-clean &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test ! -f a.out &&
+	test ! -f src/part3.c &&
+	test ! -f src/part4.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
 test_expect_success 'git-clean -n' '
 
 	mkdir -p build docs &&
@@ -73,6 +164,24 @@ test_expect_success 'git-clean -d' '
 
 '
 
+test_expect_success 'git-clean -d src/ examples/' '
+
+	mkdir -p build docs examples &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so examples/1.c &&
+	git-clean -d src/ examples/ &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test ! -f examples/1.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
 test_expect_success 'git-clean -x' '
 
 	mkdir -p build docs &&
-- 
1.5.3.GIT

^ permalink raw reply related

* Re: [StGit RFC] A more structured way of calling git
From: Karl Hasselström @ 2007-11-04 18:40 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Catalin Marinas, David Kågedal, Git Mailing List
In-Reply-To: <20071103142851.GG26436@nan92-1-81-57-214-146.fbx.proxad.net>

On 2007-11-03 15:28:51 +0100, Yann Dirson wrote:

> This reminds me of someone suggesting that some patches could be
> represented by more than one commit.

You might be remebering me pointing out that the old infrastructure
supported (or at least not directly disallowed) this.

> But I'm not sure such a beast would be useful - I fear that would
> make StGIT much more complicated, but would it really make things
> better?

Yes, it makes everything much more complicated, and no, it doesn't buy
us anything new. After all, once we know the parent and the tree we
want a patch to have, we can just manufacture a commit that has that
tree and that parent.

My proposed new infrastructure cannot represent such patches, very
much by design.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Re: [StGit RFC] A more structured way of calling git
From: Karl Hasselström @ 2007-11-04 18:34 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: David Kågedal, Git Mailing List, Yann Dirson
In-Reply-To: <b0943d9e0711030356j4dcd31cbl54d838107240b3d0@mail.gmail.com>

On 2007-11-03 10:56:36 +0000, Catalin Marinas wrote:

> On 26/10/2007, Karl Hasselström <kha@treskal.com> wrote:
>
> > I wanted to build an StGit command that coalesced adjacent patches
> > to a single patch. Because the end result tree would still be the
> > same, this should be doable without ever involving HEAD, the
> > index, or the worktree.
>
> Wouldn't HEAD need to be modified since the commit log changes
> slightly, even though the tree is the same. Or am I misunderstanding
> this?

I'm refering to the HEAD tree. The HEAD commit will of course change.

> > StGit's existing infrastructure for manipulating patches didn't
> > lend itself to doing this kind of thing, though: it's not modular
> > enough. So I started to design a replacement low-level interface
> > to git, and things got slightly out of hand ... and I ended up
> > with a much bigger refactoring than I'd planned.
>
> Thanks for this. I'll need a bit of time to read it all and give
> feedback. In general, I welcome this refactoring.
>
> I'll go through the whole e-mail in the next days and get back to
> you.

Thanks, I appreciate it.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Re: [RFC PATCH] Make gitk use --early-output
From: David Kastrup @ 2007-11-04 18:28 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, git
In-Reply-To: <18221.2285.259487.655684@cargo.ozlabs.ibm.com>

Paul Mackerras <paulus@samba.org> writes:

> This makes gitk use the --early-output flag on the git log command.
>
> When gitk sees the "Final output:" line from git log, it goes into a
> mode where it basically just checks that it is getting the commits
> again in the same order as before.  If they are, well and good; if
> not, it truncates its internal list at the point of difference and
> proceeds to read in the commits in the new order from there on, and
> re-does the graph layout if necessary.
>
> This gives a much more immediate feel to the startup; gitk shows its
> window with the first screenful of commits displayed very quickly this
> way.

This is not strictly related with the patch: would it be possible to let
gitk just stall reading from git-rev-list if it has rendered enough
content on-screen?  The behavior I have with gitk on enormous
repositories now is that it starts up reasonably fast and nice and then
proceeds to suck up all memory in the background.

Particularly annoying is that closing its window appears to work, but
wish will still proceed sucking up all the pending git-rev-list output
and allocating memory for it before it will actually exit.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: [PATCH qgit] Add support for --early-output option of git log command
From: Marco Costalba @ 2007-11-04 18:15 UTC (permalink / raw)
  To: Michael J. Cohen; +Cc: Git Mailing List
In-Reply-To: <34C93069-06F8-44DA-A18F-EE36BB457ABC@mac.com>

On 11/4/07, Michael J. Cohen <michaeljosephcohen@mac.com> wrote:
> On Nov 4, 2007, at 5:25 AM, Marco Costalba wrote:
>
> >       bool populateRenamedPatches(SCRef sha, SCList nn, FileHistory* fh,
> > QStringList* on, bool bt);
>
> **** malformed patch at line 137: QStringList* on, bool bt);
>
> looks like it was wrapped...
>

Sorry, it's a problem with gmail, please tell me if you want me to
resend as attachment or you fix the patch yourself.

Marco

^ permalink raw reply

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
From: Linus Torvalds @ 2007-11-04 18:11 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0711032234030.15101@woody.linux-foundation.org>

On Sat, 3 Nov 2007, Linus Torvalds wrote:
> > 
> > How hard would it be to put the total number of commits on that "Final
> > output" line?  That would be useful for me.
> 
> Not hard. I think we basically have it anyway.

Actually, I take that back.

It's hard. Not because we don't have the commits, but because while we do 
the top-level shape pruning in the eearly stages, we do *not* do the final 
path-limiting until we actually output the commits.

Which actually makes "--early-output" right now do some rather odd things 
when you use a path limiter: we don't do the "rewrite_parents()" thing 
until later, so the early output will have done the first level of history 
simplification, but it won't have made history *dense* yet.

I'm looking at it now, I'll have to think about this a bit more. It might 
be trivial to fix, but this thing has real potential for being subtle.

			Linus

^ permalink raw reply

* Re: [RFC PATCH] Make gitk use --early-output
From: Linus Torvalds @ 2007-11-04 17:53 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, git
In-Reply-To: <e5bfff550711040237s250bcec0iddf1ebdc616e0bbf@mail.gmail.com>

On Sun, 4 Nov 2007, Marco Costalba wrote:
> 
> But --early-output does not imply --topo-order, I guess...

Well, it does right now, because I imagined that the primary users would 
always want the topological sort.

However, I have to admit that --early-output *could* be used even without 
the topological sort, because it also works for other cases that require 
up-front limiter logic - things like ranges of commits also have to be 
fully evaluated before they are totally certain, so I could imagine seeing 
some visualizer some day that doesn't need the topo-order sort, but does 
want to get a "preliminary" list.

That said, it does seem unlikely. Anybody who asks for --early-output is 
pretty much invariably going to be an interactive visulizer: the whole 
notion doesn't make much sense otherwise. So I think I made the right 
choice in making --early-output imply topo-order, and if somebody ever 
wants to not get the output topologically sorted (unlikely), we could add 
a "--no-topo-order" flag.

Side note: if you want the "--date-order", you do need to specify *both* 
--early-output and --date-order, and it will do the right thing (ie both 
the preliminary output and the final one will be topologically sorted, but 
within that topo-sort it will be in date order rather than clumped by 
the "shape" of the history).

			Linus

^ permalink raw reply

* Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders
From: René Scharfe @ 2007-11-04 17:27 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0711041518130.4362@racer.site>

Johannes Schindelin schrieb:
> Hi,
> 
> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> 
>> Johannes Schindelin schrieb:
>>
>>> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
>>>
>>>> +	unsigned long occurs[ARRAY_SIZE(table)];
>>> You do not ever use the counts.  So, longs are overkill.  Even ints 
>>> might be overkill, but probably the most convenient.  I would have 
>>> gone with chars.  If I knew how to memset() an array of unsigned:1 
>>> entries to all zero, I would even have gone with that, but the runtime 
>>> cost of this is probably higher than the chars.
>> Well, it isn't used in format_commit_message() currently, but it could 
>> be.  Multiply the count and and the length of each substitution (minus 
>> the length of the placeholder) and you get the number of bytes you need 
>> to allocate.  interpolate() wouldn't need to be called twice anymore.
> 
> The better change, of course, would be to migrate interpolate() to strbuf.  
> Then you do not have to play clever tricks anymore.
>
>>> But the even more fundamental problem is that you count the needed 
>>> interpolations _every_ single time you output a commit message.
>>>
>>> A much better place would be get_commit_format().  Yes that means 
>>> restructuring the code a bit more, but I would say that this definitely 
>>> would help.  My preference would even be introducing a new source file for 
>>> the user format handling (commit-format.[ch]).
>> Counting the interpolations is easier than actually interpolating. 
>> Wherever the code goes, the calls to interpolate() and interp_count() 
>> should stay together.
> 
> No.
> 
> The purpose of calling interp_count() is to know what placeholders have to 
> be filled with substitutes.  As a consequence, the _logical_ thing to do 
> is call interp_count() _once_.
> 
> It makes absolutely no sense to call the function over and over again, 
> only to end up with the same result over and over again.

To allow this optimization, you need to make the (not yet filled)
interpolation table available to the new callsite of interp_count().
And you need to somehow pass the result of interp_count() from every
caller of it to the setup code in format_commit_message().

To see if it's worthwhile, I've just replaced the array "occurs" and the
call to interp_count() with a static array, and measured the runtime.
The speed difference was lost in the noise.

>>>> +
>>>> +/*
>>>> + * interp_count - count occurences of placeholders
>>>> + */
>>>> +void interp_count(unsigned long *result, const char *orig,
>>>> +                  const struct interp *interps, int ninterps)
>>>> +{
>>>> +	const char *src = orig;
>>> You do not ever use orig again.  So why not just use that variable instead 
>>> of introducing a new one?
>> I simply copied interpolate() and then chopped off the parts not needed
>> for counting, to make it easy to see that this is the smaller brother.
> 
> It is not.  It does not do any substitution.  It is a pure helper for the 
> process of filling the interpolation table.

Sure.  It's important, though, that it reports the same number of
substitutions as interpolate() later actually performs.  Correctness
trumps cleanliness, and its easier to check that a copy is correct, even
if certain pieces are missing.

>>> I'd rewrite this whole loop as
>>>
>>> 	while ((c = *(orig++)))
>>> 		if (c == '%')
>>> 			/* Try to match an interpolation string. */
>>> 			for (i = 0; i < ninterps; i++)
>>> 				if (prefixcmp(orig, interps[i].name)) {
>>> 					result[i] = 1;
>>> 					orig += strlen(interps[i].name);
>>> 					break;
>>> 				}
>> Cleanups are sure possible, but they should be done on top, and to both 
>> interpolate() and interp_count().  Let's first see how far we get with 
>> dumb code-copying and reusing the result in new ways. :)
> 
> Code copying is one of the primary sources for bad code.  Let's not even 
> start.
> 
> IMHO there have to be _very_ good reasons to commit something that you 
> plan to fix later anyway.

Code copying can be bad if one copies bugs.  But code copying allows a
strange feat: new code can inherit maturity.  If you copy known good
code and then change it in trivial ways (keeping the structure etc.) to
make it do new things, then the chance of a bug creeping in is lower
than if you wrote that piece of code anew.

> One such good reason would be that it is too hard to do in one go.  
> Another good reason would be that you think the fix is not even needed 
> (like I did when I wrote format: in the first place; I am quite surprised 
> that after _that_ long a time people complain -- I'd have expected 
> complaints right away or never).

Not everybody is as fast as you, Dscho. ;-)

Another idea that I was kicking around, but didn't get time to
implement: a performance regression test suite, i.e. make test for
timings and memory usages.

> In this case, I see no reason why we should go for suboptimal code first.
> 
> But hey, if you do not want to do it, I'll do it.  Just say so.

Busted again!  I wanted to see if someone else would pick up the
janitorial work for me. :-)

In any case, interpolate.c needs some attention, with or without my
patch.  I agree that a native strbuf version would be nice.  How about
an interface like that:

	typedef const char *(*expand_fn_t)
		(const char *placeholder, void *context);
	void strbuf_addexpand(struct strbuf *sb, const char *format,
	                      const char **placeholders,
	                      expand_fn_t fn, void *context);

strbuf_addexpand() would call fn() when it recognizes a placeholder,
avoiding unneeded setup code.  It could cache the result, so that fn()
gets called at most a single time for each given placeholder.  context
would be passed through to fn(), e.g. a struct commit in case of
format_commit_message().  Makes sense?

Thanks,
René

^ permalink raw reply

* Re: [PATCH qgit] Add support for --early-output option of git log command
From: Michael J. Cohen @ 2007-11-04 17:12 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List
In-Reply-To: <e5bfff550711040225ne67c907r2023b1354c35f35@mail.gmail.com>

On Nov 4, 2007, at 5:25 AM, Marco Costalba wrote:

> 	bool populateRenamedPatches(SCRef sha, SCList nn, FileHistory* fh,
> QStringList* on, bool bt);

**** malformed patch at line 137: QStringList* on, bool bt);

looks like it was wrapped...

-mjc

^ permalink raw reply

* Re: git rm --cached
From: Matthieu Moy @ 2007-11-04 17:04 UTC (permalink / raw)
  To: Jing Xue; +Cc: Remi Vanicat, git
In-Reply-To: <20071102174140.vobtdjxfwsgoc040@intranet.digizenstudio.com>

Jing Xue <jingxue@digizenstudio.com> writes:

> 1. I looked at the "index" as a staging area for _changes_ not files
> themselves. So where 'man git-rm' says '--caches ... remove[s] the
> paths only from the index, leaving working tree files.'  I took it to
> mean that it removes the changes on those paths, rather than staging a
> new "path deletion" action for a later commit.

The index is a full snapshot of "what will be commited". The
interesting parts of the index are usually the ones which differ from
either HEAD or the working tree, but the index do contain everything.

-- 
Matthieu

^ permalink raw reply

* Warning: cvsexportcommit considered dangerous
From: Johannes Schindelin @ 2007-11-04 16:41 UTC (permalink / raw)
  To: git

Hi,

ever since the up-to-date check was changed to use just one call to "cvs 
status", a bug was present.  Now cvsexportcommit expects "cvs status" to 
return the results in the same order as the file names were passed.

This is not true, as I had to realise with one of my projects on 
sourceforge.

Since time is so scarce on my side, I will not have time to fix this bug, 
but will instead return to my old "commit by hand" procedure.

Ciao,
Dscho

^ permalink raw reply

* [PATCH] t3502: Disambiguate between file and rev by adding --
From: Brian Gernhardt @ 2007-11-04 15:31 UTC (permalink / raw)
  To: git

This test failed because git-diff didn't know if it was asking for the
file "a" or the branch "a".  Adding "--" at the end of the ambiguous
commands allows the test to finish properly.

Signed-off-by: Brian Gernhardt <benji@silverinsanity.com>
---
 t/t3502-cherry-pick-merge.sh |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t3502-cherry-pick-merge.sh b/t/t3502-cherry-pick-merge.sh
index 3274c61..7c92e26 100755
--- a/t/t3502-cherry-pick-merge.sh
+++ b/t/t3502-cherry-pick-merge.sh
@@ -36,7 +36,7 @@ test_expect_success 'cherry-pick a non-merge with -m should fail' '
 	git reset --hard &&
 	git checkout a^0 &&
 	! git cherry-pick -m 1 b &&
-	git diff --exit-code a
+	git diff --exit-code a --
 
 '
 
@@ -45,7 +45,7 @@ test_expect_success 'cherry pick a merge without -m should fail' '
 	git reset --hard &&
 	git checkout a^0 &&
 	! git cherry-pick c &&
-	git diff --exit-code a
+	git diff --exit-code a --
 
 '
 
@@ -98,7 +98,7 @@ test_expect_success 'revert a merge (1)' '
 	git reset --hard &&
 	git checkout c^0 &&
 	git revert -m 1 c &&
-	git diff --exit-code a
+	git diff --exit-code a --
 
 '
 
@@ -107,7 +107,7 @@ test_expect_success 'revert a merge (2)' '
 	git reset --hard &&
 	git checkout c^0 &&
 	git revert -m 2 c &&
-	git diff --exit-code b
+	git diff --exit-code b --
 
 '
 
-- 
1.5.3.5.530.gcd7a

^ permalink raw reply related

* Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders
From: Johannes Schindelin @ 2007-11-04 15:25 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <472DDA3B.4090100@lsrfire.ath.cx>

Hi,

On Sun, 4 Nov 2007, Ren? Scharfe wrote:

> Johannes Schindelin schrieb:
> 
> > On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> > 
> >> +	unsigned long occurs[ARRAY_SIZE(table)];
> > 
> > You do not ever use the counts.  So, longs are overkill.  Even ints 
> > might be overkill, but probably the most convenient.  I would have 
> > gone with chars.  If I knew how to memset() an array of unsigned:1 
> > entries to all zero, I would even have gone with that, but the runtime 
> > cost of this is probably higher than the chars.
> 
> Well, it isn't used in format_commit_message() currently, but it could 
> be.  Multiply the count and and the length of each substitution (minus 
> the length of the placeholder) and you get the number of bytes you need 
> to allocate.  interpolate() wouldn't need to be called twice anymore.

The better change, of course, would be to migrate interpolate() to strbuf.  
Then you do not have to play clever tricks anymore.

> > But the even more fundamental problem is that you count the needed 
> > interpolations _every_ single time you output a commit message.
> > 
> > A much better place would be get_commit_format().  Yes that means 
> > restructuring the code a bit more, but I would say that this definitely 
> > would help.  My preference would even be introducing a new source file for 
> > the user format handling (commit-format.[ch]).
> 
> Counting the interpolations is easier than actually interpolating. 
> Wherever the code goes, the calls to interpolate() and interp_count() 
> should stay together.

No.

The purpose of calling interp_count() is to know what placeholders have to 
be filled with substitutes.  As a consequence, the _logical_ thing to do 
is call interp_count() _once_.

It makes absolutely no sense to call the function over and over again, 
only to end up with the same result over and over again.

> >> +
> >> +/*
> >> + * interp_count - count occurences of placeholders
> >> + */
> >> +void interp_count(unsigned long *result, const char *orig,
> >> +                  const struct interp *interps, int ninterps)
> >> +{
> >> +	const char *src = orig;
> > 
> > You do not ever use orig again.  So why not just use that variable instead 
> > of introducing a new one?
> 
> I simply copied interpolate() and then chopped off the parts not needed
> for counting, to make it easy to see that this is the smaller brother.

It is not.  It does not do any substitution.  It is a pure helper for the 
process of filling the interpolation table.

> > I'd rewrite this whole loop as
> > 
> > 	while ((c = *(orig++)))
> > 		if (c == '%')
> > 			/* Try to match an interpolation string. */
> > 			for (i = 0; i < ninterps; i++)
> > 				if (prefixcmp(orig, interps[i].name)) {
> > 					result[i] = 1;
> > 					orig += strlen(interps[i].name);
> > 					break;
> > 				}
> 
> Cleanups are sure possible, but they should be done on top, and to both 
> interpolate() and interp_count().  Let's first see how far we get with 
> dumb code-copying and reusing the result in new ways. :)

Code copying is one of the primary sources for bad code.  Let's not even 
start.

IMHO there have to be _very_ good reasons to commit something that you 
plan to fix later anyway.

One such good reason would be that it is too hard to do in one go.  
Another good reason would be that you think the fix is not even needed 
(like I did when I wrote format: in the first place; I am quite surprised 
that after _that_ long a time people complain -- I'd have expected 
complaints right away or never).

In this case, I see no reason why we should go for suboptimal code first.

But hey, if you do not want to do it, I'll do it.  Just say so.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 1/5] pretty describe: add name info to struct commit
From: René Scharfe @ 2007-11-04 15:06 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Git Mailing List, Johannes Schindelin
In-Reply-To: <20071104140700.GB3078@steel.home>

Alex Riesen schrieb:
> René Scharfe, Sun, Nov 04, 2007 12:48:22 +0100:
>> diff --git a/commit.h b/commit.h
>> index b661503..80e94b9 100644
>> --- a/commit.h
>> +++ b/commit.h
>> @@ -18,6 +18,9 @@ struct commit {
>>  	struct commit_list *parents;
>>  	struct tree *tree;
>>  	char *buffer;
>> +	char *name;
>> +	unsigned int name_flags;
>> +	char name_prio;
>>  };
> 
> It increases size of struct commit by ~12 bytes (assuming 4byte
> allignment), and this is a popular structure. Besides, the three new
> fields used by only git-describe, which nobody has in their top-ten
> used commands (see "best git practices" thread).

True.  When I was looking for a place for the name info I was a bit
worried about this increase, but dismissed it after looking at the
kernel repository: there are ca. 140000 commits, which means my patch
increased memory usage by 2MB for commands that operate on all commits
at the same time.  I haven't taken any measurements to back up this
estimate, though..

I had looked briefly at the decorate stuff that Dscho mentioned in
another reply, but I can't remember why I didn't use it.  Guess I wasn't
motivated enough by those 2MB. ;-)  I'll take another look.

Thanks,
René

^ permalink raw reply

* Re: [PATCH 3/5] pretty describe: move library functions to the new file describe.c
From: René Scharfe @ 2007-11-04 14:56 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0711041435540.4362@racer.site>

Johannes Schindelin schrieb:
> Hi,
> 
> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> 
>>  Makefile           |    2 +-
>>  builtin-describe.c |  202 ---------------------------------------------------
>>  cache.h            |    5 ++
>>  describe.c         |  204 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 210 insertions(+), 203 deletions(-)
>>  create mode 100644 describe.c
> 
> Would not "format-patch -C -C" have given a nicer output?

Yes, it would have been shorter, but it looks a bit strange, because it's
deleting stuff from the new file describe.c (i.e. all the things left in
builtin-describe.c):

 Makefile                         |    2 +-
 builtin-describe.c               |  202 --------------------------------------
 cache.h                          |    5 +
 builtin-describe.c => describe.c |   85 ----------------
 4 files changed, 6 insertions(+), 288 deletions(-)
 copy builtin-describe.c => describe.c (66%)

That's 367 lines of patch + stat vs. 470 lines for the one I sent.  Will
use next time.

Thanks,
René

^ permalink raw reply

* Re: [PATCH 04/10] Migrate git-clone to use git-rev-parse --parseopt
From: Pierre Habouzit @ 2007-11-04 14:49 UTC (permalink / raw)
  To: gitster; +Cc: git
In-Reply-To: <1194172262-1563-5-git-send-email-madcoder@debian.org>

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

  Note: this patch now conflicts with a recent patch to make git clone
grok `--`. As git rev-parse --parseopt does that as a side effect, you
can force the update to the parseopt version without functionality loss.

Cheers,
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders
From: René Scharfe @ 2007-11-04 14:42 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0711041356330.4362@racer.site>

Johannes Schindelin schrieb:
> Hi,
> 
> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> 
>> The new placeholders %ds (description string, git-describe style), %dn 
>> (the name part) and %dd (the depth part) are added.
>>
>> To avoid imposing the significant cost of running describe_commit() on 
>> every format string, even if none of the new placeholders are used, a 
>> new function, interp_count(), is introduced.  It is a stripped down 
>> version of interpolate(), that simply counts the placeholders in a 
>> format string.  If the describe placeholders are not found, setting up 
>> the corresponding replacements is skipped.
> 
> The way I read this, those are two really quite independent patches 
> squashed into one.

Busted!  I didn't want to introduce a performance regression with the
%ds parsing code, but I also didn't want to add a function without
users.  Patch 6 was then glued on as an afterthought -- the rest of the
series was ready when I saw Paul's mail.

So, a better way might be to move patch 6 to the head of the series and
introduce interp_count() in it, too.

>> +	unsigned long occurs[ARRAY_SIZE(table)];
> 
> You do not ever use the counts.  So, longs are overkill.  Even ints might 
> be overkill, but probably the most convenient.  I would have gone with 
> chars.  If I knew how to memset() an array of unsigned:1 entries to all 
> zero, I would even have gone with that, but the runtime cost of this is 
> probably higher than the chars.

Well, it isn't used in format_commit_message() currently, but it could
be.  Multiply the count and and the length of each substitution (minus
the length of the placeholder) and you get the number of bytes you need
to allocate.  interpolate() wouldn't need to be called twice anymore.

> But the even more fundamental problem is that you count the needed 
> interpolations _every_ single time you output a commit message.
> 
> A much better place would be get_commit_format().  Yes that means 
> restructuring the code a bit more, but I would say that this definitely 
> would help.  My preference would even be introducing a new source file for 
> the user format handling (commit-format.[ch]).

Counting the interpolations is easier than actually interpolating.
Wherever the code goes, the calls to interpolate() and interp_count()
should stay together.

>> +
>> +/*
>> + * interp_count - count occurences of placeholders
>> + */
>> +void interp_count(unsigned long *result, const char *orig,
>> +                  const struct interp *interps, int ninterps)
>> +{
>> +	const char *src = orig;
> 
> You do not ever use orig again.  So why not just use that variable instead 
> of introducing a new one?

I simply copied interpolate() and then chopped off the parts not needed
for counting, to make it easy to see that this is the smaller brother.

> 
>> +	const char *name;
>> +	unsigned long namelen;
>> +	int i;
>> +	char c;
>> +
>> +	for (i = 0; i < ninterps; i++)
>> +		result[i] = 0;
> 
> memset()?

In earlier versions there was a memset() there.  I replaced it to make
the intent even more clear, but I guess not using memset() is simply
superstition..

> 
>> +
>> +	while ((c = *src)) {
>> +		if (c == '%') {
>> +			/* Try to match an interpolation string. */
>> +			for (i = 0; i < ninterps; i++) {
>> +				name = interps[i].name;
>> +				namelen = strlen(name);
>> +				if (strncmp(src, name, namelen) == 0)
> 
> prefixcmp()?

Copied from interpolate()..

>> +					break;
>> +			}
>> +
>> +			/* Check for valid interpolation. */
>> +			if (i < ninterps) {
> 
> This is ugly.  You already had a successful if() for that case earlier.

Dito..

> 
>> +				result[i]++;
>> +				src += namelen;
>> +				continue;
>> +			}
>> +		}
>> +		src++;
>> +	}
>> +}
> 
> I'd rewrite this whole loop as
> 
> 	while ((c = *(orig++)))
> 		if (c == '%')
> 			/* Try to match an interpolation string. */
> 			for (i = 0; i < ninterps; i++)
> 				if (prefixcmp(orig, interps[i].name)) {
> 					result[i] = 1;
> 					orig += strlen(interps[i].name);
> 					break;
> 				}

Cleanups are sure possible, but they should be done on top, and to both
interpolate() and interp_count().  Let's first see how far we get with
dumb code-copying and reusing the result in new ways. :)

Thanks,
René

^ permalink raw reply

* Re: [PATCH 3/5] pretty describe: move library functions to the new file describe.c
From: Johannes Schindelin @ 2007-11-04 14:36 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <472DB199.2040305@lsrfire.ath.cx>

Hi,

On Sun, 4 Nov 2007, Ren? Scharfe wrote:

>  Makefile           |    2 +-
>  builtin-describe.c |  202 ---------------------------------------------------
>  cache.h            |    5 ++
>  describe.c         |  204 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 210 insertions(+), 203 deletions(-)
>  create mode 100644 describe.c

Would not "format-patch -C -C" have given a nicer output?

Ciao,
Dscho

^ permalink raw reply

* [PATCH] Remove a couple of duplicated include
From: Marco Costalba @ 2007-11-04 14:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
---
 compat/inet_ntop.c |    1 -
 compat/inet_pton.c |    1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/compat/inet_ntop.c b/compat/inet_ntop.c
index 4d7ab9d..f444982 100644
--- a/compat/inet_ntop.c
+++ b/compat/inet_ntop.c
@@ -18,7 +18,6 @@
 #include <errno.h>
 #include <sys/types.h>
 #include <sys/socket.h>
-#include <sys/socket.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
 #include <stdio.h>
diff --git a/compat/inet_pton.c b/compat/inet_pton.c
index 5704e0d..4078fc0 100644
--- a/compat/inet_pton.c
+++ b/compat/inet_pton.c
@@ -18,7 +18,6 @@
 #include <errno.h>
 #include <sys/types.h>
 #include <sys/socket.h>
-#include <sys/socket.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
 #include <stdio.h>
-- 
1.5.3.5.532.g5c38-dirty

^ permalink raw reply related

* Re: [PATCH 1/5] pretty describe: add name info to struct commit
From: Johannes Schindelin @ 2007-11-04 14:22 UTC (permalink / raw)
  To: Alex Riesen; +Cc: René Scharfe, Junio C Hamano, Git Mailing List
In-Reply-To: <20071104140700.GB3078@steel.home>

Hi,

On Sun, 4 Nov 2007, Alex Riesen wrote:

> Ren? Scharfe, Sun, Nov 04, 2007 12:48:22 +0100:
> > diff --git a/commit.h b/commit.h
> > index b661503..80e94b9 100644
> > --- a/commit.h
> > +++ b/commit.h
> > @@ -18,6 +18,9 @@ struct commit {
> >  	struct commit_list *parents;
> >  	struct tree *tree;
> >  	char *buffer;
> > +	char *name;
> > +	unsigned int name_flags;
> > +	char name_prio;
> >  };
> 
> It increases size of struct commit by ~12 bytes (assuming 4byte
> allignment), and this is a popular structure.

I was just about to say something similar.

But there is a wonderful solution: Have a look at decorate.[ch].  This is 
the structure you should use IMHO.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 9/5] Migrate git-checkout.sh to use git-rev-parse --parseopt  --keep-dashdash
From: Pierre Habouzit @ 2007-11-04 14:18 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0711041343470.4362@racer.site>

[-- Attachment #1: Type: text/plain, Size: 2481 bytes --]

On Sun, Nov 04, 2007 at 01:48:36PM +0000, Johannes Schindelin wrote:
> > Pierre Habouzit <madcoder@debian.org> writes:
> > It takes on the standard input the specification of the options to parse
> > and understand, and echoes on the standard ouput a line suitable for 
> > `sh(1)` `eval` to replace the arguments with normalized ones.
> 
> Why not go the full nine yards and output something which when eval'ed 
> sets the variables correctly (taking the variable names from the option 
> names; long name if available, otherwise short one)?  It can also set the 
> command line arguments to what's left after option parsing, with a "set" 
> call.

We could do that, though it's not as great as it looks like at the first
glance. If you want -vvv to work like an accumulator, then you need a
really more complex approach in the C code. To enter the gory details,
git-rev-parse --parseopt uses a callback that deals with options and
their arguments one by one, then appends a delimiter to tell the shell
script that only arguments follow, and then appends the arguments the
option parser left alone.

It does not deal with the semantics that the C has available at all,
it's up to the shell script to decide which is better.

My goal with this is not really to do all the work for the shell script
author, but rather to the user: it's not because a porcelain is new (or
not a builtin yet) that it should have a creepy interface. If it helps
the programmer as a side effect, then it's great, but this series really
is about usability to me.

Of course we can do what you propose, but it will probably be quite
sophisticated and looks to me like an overkill to what shell builtins
really are used for: prototyping a new porcelain until it becomes a new
full blown C-builtin.  I do believe in simplicity after all :)

> And to prevent funny games with "PARSEOPT_OPTS=blabla git xyz", why not 
> provide a function in git-sh-setup which takes the string as argument, and 
> is called _after_ sourcing git-sh-setup?

This one is quite a non issue, it's only used for keep-dashdash, and I
see no other need in the near future. And if the need arise, it'll still
be doable any time. So for now I've taken a non-generic way to do that,
and I believe it's fine as it is.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders
From: Johannes Schindelin @ 2007-11-04 14:11 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <472DB1B0.1050904@lsrfire.ath.cx>

Hi,

On Sun, 4 Nov 2007, Ren? Scharfe wrote:

> The new placeholders %ds (description string, git-describe style), %dn 
> (the name part) and %dd (the depth part) are added.
> 
> To avoid imposing the significant cost of running describe_commit() on 
> every format string, even if none of the new placeholders are used, a 
> new function, interp_count(), is introduced.  It is a stripped down 
> version of interpolate(), that simply counts the placeholders in a 
> format string.  If the describe placeholders are not found, setting up 
> the corresponding replacements is skipped.

The way I read this, those are two really quite independent patches 
squashed into one.

> +	unsigned long occurs[ARRAY_SIZE(table)];

You do not ever use the counts.  So, longs are overkill.  Even ints might 
be overkill, but probably the most convenient.  I would have gone with 
chars.  If I knew how to memset() an array of unsigned:1 entries to all 
zero, I would even have gone with that, but the runtime cost of this is 
probably higher than the chars.

But the even more fundamental problem is that you count the needed 
interpolations _every_ single time you output a commit message.

A much better place would be get_commit_format().  Yes that means 
restructuring the code a bit more, but I would say that this definitely 
would help.  My preference would even be introducing a new source file for 
the user format handling (commit-format.[ch]).

> +
> +/*
> + * interp_count - count occurences of placeholders
> + */
> +void interp_count(unsigned long *result, const char *orig,
> +                  const struct interp *interps, int ninterps)
> +{
> +	const char *src = orig;

You do not ever use orig again.  So why not just use that variable instead 
of introducing a new one?

> +	const char *name;
> +	unsigned long namelen;
> +	int i;
> +	char c;
> +
> +	for (i = 0; i < ninterps; i++)
> +		result[i] = 0;

memset()?

> +
> +	while ((c = *src)) {
> +		if (c == '%') {
> +			/* Try to match an interpolation string. */
> +			for (i = 0; i < ninterps; i++) {
> +				name = interps[i].name;
> +				namelen = strlen(name);
> +				if (strncmp(src, name, namelen) == 0)

prefixcmp()?

> +					break;
> +			}
> +
> +			/* Check for valid interpolation. */
> +			if (i < ninterps) {

This is ugly.  You already had a successful if() for that case earlier.

> +				result[i]++;
> +				src += namelen;
> +				continue;
> +			}
> +		}
> +		src++;
> +	}
> +}

I'd rewrite this whole loop as

	while ((c = *(orig++)))
		if (c == '%')
			/* Try to match an interpolation string. */
			for (i = 0; i < ninterps; i++)
				if (prefixcmp(orig, interps[i].name)) {
					result[i] = 1;
					orig += strlen(interps[i].name);
					break;
				}

Ciao,
Dscho

^ permalink raw reply

* [PATCH 7/5] pretty describe: add min_prio parameter to describe_commit()
From: René Scharfe @ 2007-11-04 14:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <472DB1B0.1050904@lsrfire.ath.cx>

If load_commit_names() is called with a certain priority and later with
a higher priority, all the old, low-priority names are still kept and
describe_commit() will happily take them into account.  It can thus
report heads, even if the user asked for annotated tags.

In the current code, this can't happen, because builtin-describe.c calls
load_commit_names() only once, and commit.c always asks for the highest
priority (annotated tags).  This patch fixes the problem anyway, for the
benefit of future users.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---

 builtin-describe.c |    6 ++++--
 cache.h            |    2 +-
 commit.c           |    2 +-
 describe.c         |    7 ++++---
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/builtin-describe.c b/builtin-describe.c
index fcd93f4..481d92f 100644
--- a/builtin-describe.c
+++ b/builtin-describe.c
@@ -14,6 +14,7 @@ static int all;	/* Default to annotated tags only */
 static int tags;	/* But allow any tags if --tags is specified */
 static int abbrev = DEFAULT_ABBREV;
 static int max_candidates = 10;
+static int min_prio;
 
 static void describe(const char *arg, int last_one)
 {
@@ -28,7 +29,7 @@ static void describe(const char *arg, int last_one)
 	if (!cmit)
 		die("%s is not a valid '%s' object", arg, commit_type);
 
-	name = describe_commit(cmit, max_candidates, debug, &depth);
+	name = describe_commit(cmit, max_candidates, min_prio, debug, &depth);
 	if (!name)
 		die("cannot describe '%s'", sha1_to_hex(cmit->object.sha1));
 
@@ -74,7 +75,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 	 * If --tags, then any tags are used.
 	 * Otherwise only annotated tags are used.
 	 */
-	load_commit_names(all ? 0 : (tags ? 1 : 2));
+	min_prio = all ? 0 : (tags ? 1 : 2);
+	load_commit_names(min_prio);
 
 	if (argc == 0) {
 		describe("HEAD", 1);
diff --git a/cache.h b/cache.h
index 84423a3..703d6a9 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,6 @@ void shift_tree(const unsigned char *, const unsigned char *, unsigned char *, i
 /* describe.c */
 struct commit;
 extern void load_commit_names(int min_prio);
-extern char *describe_commit(struct commit *cmit, int max_candidates, int debug, int *depthp);
+extern char *describe_commit(struct commit *cmit, int max_candidates, int min_prio, int debug, int *depthp);
 
 #endif /* CACHE_H */
diff --git a/commit.c b/commit.c
index 9ff4735..624005b 100644
--- a/commit.c
+++ b/commit.c
@@ -897,7 +897,7 @@ void format_commit_message(struct commit *commit,
 		const char *abbr;
 
 		load_commit_names(2);
-		name = describe_commit(commit, 10, 0, &depth);
+		name = describe_commit(commit, 10, 2, 0, &depth);
 		clear_commit_name_flags(commit);
 		abbr = find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV);
 
diff --git a/describe.c b/describe.c
index 18c7abc..b6de4c1 100644
--- a/describe.c
+++ b/describe.c
@@ -102,7 +102,8 @@ static unsigned long finish_depth_computation(
 	return seen_commits;
 }
 
-char *describe_commit(struct commit *cmit, int max_candidates, int debug, int *depthp)
+char *describe_commit(struct commit *cmit, int max_candidates, int min_prio,
+                      int debug, int *depthp)
 {
 	struct commit *gave_up_on = NULL;
 	struct commit_list *list;
@@ -110,7 +111,7 @@ char *describe_commit(struct commit *cmit, int max_candidates, int debug, int *d
 	unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
 	unsigned long seen_commits = 0;
 
-	if (cmit->name) {
+	if (cmit->name && cmit->name_prio >= min_prio) {
 		*depthp = 0;
 		return cmit->name;
 	}
@@ -130,7 +131,7 @@ char *describe_commit(struct commit *cmit, int max_candidates, int debug, int *d
 		struct commit *c = pop_commit(&list);
 		struct commit_list *parents = c->parents;
 		seen_commits++;
-		if (c->name) {
+		if (c->name && c->name_prio >= min_prio) {
 			if (match_cnt < max_candidates) {
 				struct possible_tag *t = &all_matches[match_cnt++];
 				t->name = c->name;

^ permalink raw reply related

* Re: [PATCH 1/5] pretty describe: add name info to struct commit
From: Alex Riesen @ 2007-11-04 14:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <472DB186.9030909@lsrfire.ath.cx>

René Scharfe, Sun, Nov 04, 2007 12:48:22 +0100:
> diff --git a/commit.h b/commit.h
> index b661503..80e94b9 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -18,6 +18,9 @@ struct commit {
>  	struct commit_list *parents;
>  	struct tree *tree;
>  	char *buffer;
> +	char *name;
> +	unsigned int name_flags;
> +	char name_prio;
>  };

It increases size of struct commit by ~12 bytes (assuming 4byte
allignment), and this is a popular structure. Besides, the three new
fields used by only git-describe, which nobody has in their top-ten
used commands (see "best git practices" thread). If the fields are so
badly needed (and the information can't (really?) be stored somewhere
else), maybe they could be at least compressed when not used:

	struct commit_name_info {
		unsigned int name_flags;
		char name_prio;
		char name[FLEX_ARRAY];
	};

	struct commit {
		struct commit_list *parents;
		struct tree *tree;
		char *buffer;
		struct commit_name_info *name_info;
	};

BTW, do we still have some bits left in struct object->flags?
To, for example, switch between normal and expanded structure of
second-level object (i.e. commit and commit-with-names).

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox