Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 04/10] Migrate git-clone to use git-rev-parse --parseopt
From: Pierre Habouzit @ 2007-11-04 14:49 UTC (permalink / raw)
  To: gitster; +Cc: git
In-Reply-To: <1194172262-1563-5-git-send-email-madcoder@debian.org>

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

  Note: this patch now conflicts with a recent patch to make git clone
grok `--`. As git rev-parse --parseopt does that as a side effect, you
can force the update to the parseopt version without functionality loss.

Cheers,
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH 3/5] pretty describe: move library functions to the new file describe.c
From: René Scharfe @ 2007-11-04 14:56 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0711041435540.4362@racer.site>

Johannes Schindelin schrieb:
> Hi,
> 
> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> 
>>  Makefile           |    2 +-
>>  builtin-describe.c |  202 ---------------------------------------------------
>>  cache.h            |    5 ++
>>  describe.c         |  204 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 210 insertions(+), 203 deletions(-)
>>  create mode 100644 describe.c
> 
> Would not "format-patch -C -C" have given a nicer output?

Yes, it would have been shorter, but it looks a bit strange, because it's
deleting stuff from the new file describe.c (i.e. all the things left in
builtin-describe.c):

 Makefile                         |    2 +-
 builtin-describe.c               |  202 --------------------------------------
 cache.h                          |    5 +
 builtin-describe.c => describe.c |   85 ----------------
 4 files changed, 6 insertions(+), 288 deletions(-)
 copy builtin-describe.c => describe.c (66%)

That's 367 lines of patch + stat vs. 470 lines for the one I sent.  Will
use next time.

Thanks,
René

^ permalink raw reply

* Re: [PATCH 1/5] pretty describe: add name info to struct commit
From: René Scharfe @ 2007-11-04 15:06 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Git Mailing List, Johannes Schindelin
In-Reply-To: <20071104140700.GB3078@steel.home>

Alex Riesen schrieb:
> René Scharfe, Sun, Nov 04, 2007 12:48:22 +0100:
>> diff --git a/commit.h b/commit.h
>> index b661503..80e94b9 100644
>> --- a/commit.h
>> +++ b/commit.h
>> @@ -18,6 +18,9 @@ struct commit {
>>  	struct commit_list *parents;
>>  	struct tree *tree;
>>  	char *buffer;
>> +	char *name;
>> +	unsigned int name_flags;
>> +	char name_prio;
>>  };
> 
> It increases size of struct commit by ~12 bytes (assuming 4byte
> allignment), and this is a popular structure. Besides, the three new
> fields used by only git-describe, which nobody has in their top-ten
> used commands (see "best git practices" thread).

True.  When I was looking for a place for the name info I was a bit
worried about this increase, but dismissed it after looking at the
kernel repository: there are ca. 140000 commits, which means my patch
increased memory usage by 2MB for commands that operate on all commits
at the same time.  I haven't taken any measurements to back up this
estimate, though..

I had looked briefly at the decorate stuff that Dscho mentioned in
another reply, but I can't remember why I didn't use it.  Guess I wasn't
motivated enough by those 2MB. ;-)  I'll take another look.

Thanks,
René

^ permalink raw reply

* Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders
From: Johannes Schindelin @ 2007-11-04 15:25 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <472DDA3B.4090100@lsrfire.ath.cx>

Hi,

On Sun, 4 Nov 2007, Ren? Scharfe wrote:

> Johannes Schindelin schrieb:
> 
> > On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> > 
> >> +	unsigned long occurs[ARRAY_SIZE(table)];
> > 
> > You do not ever use the counts.  So, longs are overkill.  Even ints 
> > might be overkill, but probably the most convenient.  I would have 
> > gone with chars.  If I knew how to memset() an array of unsigned:1 
> > entries to all zero, I would even have gone with that, but the runtime 
> > cost of this is probably higher than the chars.
> 
> Well, it isn't used in format_commit_message() currently, but it could 
> be.  Multiply the count and and the length of each substitution (minus 
> the length of the placeholder) and you get the number of bytes you need 
> to allocate.  interpolate() wouldn't need to be called twice anymore.

The better change, of course, would be to migrate interpolate() to strbuf.  
Then you do not have to play clever tricks anymore.

> > But the even more fundamental problem is that you count the needed 
> > interpolations _every_ single time you output a commit message.
> > 
> > A much better place would be get_commit_format().  Yes that means 
> > restructuring the code a bit more, but I would say that this definitely 
> > would help.  My preference would even be introducing a new source file for 
> > the user format handling (commit-format.[ch]).
> 
> Counting the interpolations is easier than actually interpolating. 
> Wherever the code goes, the calls to interpolate() and interp_count() 
> should stay together.

No.

The purpose of calling interp_count() is to know what placeholders have to 
be filled with substitutes.  As a consequence, the _logical_ thing to do 
is call interp_count() _once_.

It makes absolutely no sense to call the function over and over again, 
only to end up with the same result over and over again.

> >> +
> >> +/*
> >> + * interp_count - count occurences of placeholders
> >> + */
> >> +void interp_count(unsigned long *result, const char *orig,
> >> +                  const struct interp *interps, int ninterps)
> >> +{
> >> +	const char *src = orig;
> > 
> > You do not ever use orig again.  So why not just use that variable instead 
> > of introducing a new one?
> 
> I simply copied interpolate() and then chopped off the parts not needed
> for counting, to make it easy to see that this is the smaller brother.

It is not.  It does not do any substitution.  It is a pure helper for the 
process of filling the interpolation table.

> > I'd rewrite this whole loop as
> > 
> > 	while ((c = *(orig++)))
> > 		if (c == '%')
> > 			/* Try to match an interpolation string. */
> > 			for (i = 0; i < ninterps; i++)
> > 				if (prefixcmp(orig, interps[i].name)) {
> > 					result[i] = 1;
> > 					orig += strlen(interps[i].name);
> > 					break;
> > 				}
> 
> Cleanups are sure possible, but they should be done on top, and to both 
> interpolate() and interp_count().  Let's first see how far we get with 
> dumb code-copying and reusing the result in new ways. :)

Code copying is one of the primary sources for bad code.  Let's not even 
start.

IMHO there have to be _very_ good reasons to commit something that you 
plan to fix later anyway.

One such good reason would be that it is too hard to do in one go.  
Another good reason would be that you think the fix is not even needed 
(like I did when I wrote format: in the first place; I am quite surprised 
that after _that_ long a time people complain -- I'd have expected 
complaints right away or never).

In this case, I see no reason why we should go for suboptimal code first.

But hey, if you do not want to do it, I'll do it.  Just say so.

Ciao,
Dscho

^ permalink raw reply

* [PATCH] t3502: Disambiguate between file and rev by adding --
From: Brian Gernhardt @ 2007-11-04 15:31 UTC (permalink / raw)
  To: git

This test failed because git-diff didn't know if it was asking for the
file "a" or the branch "a".  Adding "--" at the end of the ambiguous
commands allows the test to finish properly.

Signed-off-by: Brian Gernhardt <benji@silverinsanity.com>
---
 t/t3502-cherry-pick-merge.sh |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t3502-cherry-pick-merge.sh b/t/t3502-cherry-pick-merge.sh
index 3274c61..7c92e26 100755
--- a/t/t3502-cherry-pick-merge.sh
+++ b/t/t3502-cherry-pick-merge.sh
@@ -36,7 +36,7 @@ test_expect_success 'cherry-pick a non-merge with -m should fail' '
 	git reset --hard &&
 	git checkout a^0 &&
 	! git cherry-pick -m 1 b &&
-	git diff --exit-code a
+	git diff --exit-code a --
 
 '
 
@@ -45,7 +45,7 @@ test_expect_success 'cherry pick a merge without -m should fail' '
 	git reset --hard &&
 	git checkout a^0 &&
 	! git cherry-pick c &&
-	git diff --exit-code a
+	git diff --exit-code a --
 
 '
 
@@ -98,7 +98,7 @@ test_expect_success 'revert a merge (1)' '
 	git reset --hard &&
 	git checkout c^0 &&
 	git revert -m 1 c &&
-	git diff --exit-code a
+	git diff --exit-code a --
 
 '
 
@@ -107,7 +107,7 @@ test_expect_success 'revert a merge (2)' '
 	git reset --hard &&
 	git checkout c^0 &&
 	git revert -m 2 c &&
-	git diff --exit-code b
+	git diff --exit-code b --
 
 '
 
-- 
1.5.3.5.530.gcd7a

^ permalink raw reply related

* Warning: cvsexportcommit considered dangerous
From: Johannes Schindelin @ 2007-11-04 16:41 UTC (permalink / raw)
  To: git

Hi,

ever since the up-to-date check was changed to use just one call to "cvs 
status", a bug was present.  Now cvsexportcommit expects "cvs status" to 
return the results in the same order as the file names were passed.

This is not true, as I had to realise with one of my projects on 
sourceforge.

Since time is so scarce on my side, I will not have time to fix this bug, 
but will instead return to my old "commit by hand" procedure.

Ciao,
Dscho

^ permalink raw reply

* Re: git rm --cached
From: Matthieu Moy @ 2007-11-04 17:04 UTC (permalink / raw)
  To: Jing Xue; +Cc: Remi Vanicat, git
In-Reply-To: <20071102174140.vobtdjxfwsgoc040@intranet.digizenstudio.com>

Jing Xue <jingxue@digizenstudio.com> writes:

> 1. I looked at the "index" as a staging area for _changes_ not files
> themselves. So where 'man git-rm' says '--caches ... remove[s] the
> paths only from the index, leaving working tree files.'  I took it to
> mean that it removes the changes on those paths, rather than staging a
> new "path deletion" action for a later commit.

The index is a full snapshot of "what will be commited". The
interesting parts of the index are usually the ones which differ from
either HEAD or the working tree, but the index do contain everything.

-- 
Matthieu

^ permalink raw reply

* Re: [PATCH qgit] Add support for --early-output option of git log command
From: Michael J. Cohen @ 2007-11-04 17:12 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List
In-Reply-To: <e5bfff550711040225ne67c907r2023b1354c35f35@mail.gmail.com>

On Nov 4, 2007, at 5:25 AM, Marco Costalba wrote:

> 	bool populateRenamedPatches(SCRef sha, SCList nn, FileHistory* fh,
> QStringList* on, bool bt);

**** malformed patch at line 137: QStringList* on, bool bt);

looks like it was wrapped...

-mjc

^ permalink raw reply

* Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders
From: René Scharfe @ 2007-11-04 17:27 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0711041518130.4362@racer.site>

Johannes Schindelin schrieb:
> Hi,
> 
> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> 
>> Johannes Schindelin schrieb:
>>
>>> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
>>>
>>>> +	unsigned long occurs[ARRAY_SIZE(table)];
>>> You do not ever use the counts.  So, longs are overkill.  Even ints 
>>> might be overkill, but probably the most convenient.  I would have 
>>> gone with chars.  If I knew how to memset() an array of unsigned:1 
>>> entries to all zero, I would even have gone with that, but the runtime 
>>> cost of this is probably higher than the chars.
>> Well, it isn't used in format_commit_message() currently, but it could 
>> be.  Multiply the count and and the length of each substitution (minus 
>> the length of the placeholder) and you get the number of bytes you need 
>> to allocate.  interpolate() wouldn't need to be called twice anymore.
> 
> The better change, of course, would be to migrate interpolate() to strbuf.  
> Then you do not have to play clever tricks anymore.
>
>>> But the even more fundamental problem is that you count the needed 
>>> interpolations _every_ single time you output a commit message.
>>>
>>> A much better place would be get_commit_format().  Yes that means 
>>> restructuring the code a bit more, but I would say that this definitely 
>>> would help.  My preference would even be introducing a new source file for 
>>> the user format handling (commit-format.[ch]).
>> Counting the interpolations is easier than actually interpolating. 
>> Wherever the code goes, the calls to interpolate() and interp_count() 
>> should stay together.
> 
> No.
> 
> The purpose of calling interp_count() is to know what placeholders have to 
> be filled with substitutes.  As a consequence, the _logical_ thing to do 
> is call interp_count() _once_.
> 
> It makes absolutely no sense to call the function over and over again, 
> only to end up with the same result over and over again.

To allow this optimization, you need to make the (not yet filled)
interpolation table available to the new callsite of interp_count().
And you need to somehow pass the result of interp_count() from every
caller of it to the setup code in format_commit_message().

To see if it's worthwhile, I've just replaced the array "occurs" and the
call to interp_count() with a static array, and measured the runtime.
The speed difference was lost in the noise.

>>>> +
>>>> +/*
>>>> + * interp_count - count occurences of placeholders
>>>> + */
>>>> +void interp_count(unsigned long *result, const char *orig,
>>>> +                  const struct interp *interps, int ninterps)
>>>> +{
>>>> +	const char *src = orig;
>>> You do not ever use orig again.  So why not just use that variable instead 
>>> of introducing a new one?
>> I simply copied interpolate() and then chopped off the parts not needed
>> for counting, to make it easy to see that this is the smaller brother.
> 
> It is not.  It does not do any substitution.  It is a pure helper for the 
> process of filling the interpolation table.

Sure.  It's important, though, that it reports the same number of
substitutions as interpolate() later actually performs.  Correctness
trumps cleanliness, and its easier to check that a copy is correct, even
if certain pieces are missing.

>>> I'd rewrite this whole loop as
>>>
>>> 	while ((c = *(orig++)))
>>> 		if (c == '%')
>>> 			/* Try to match an interpolation string. */
>>> 			for (i = 0; i < ninterps; i++)
>>> 				if (prefixcmp(orig, interps[i].name)) {
>>> 					result[i] = 1;
>>> 					orig += strlen(interps[i].name);
>>> 					break;
>>> 				}
>> Cleanups are sure possible, but they should be done on top, and to both 
>> interpolate() and interp_count().  Let's first see how far we get with 
>> dumb code-copying and reusing the result in new ways. :)
> 
> Code copying is one of the primary sources for bad code.  Let's not even 
> start.
> 
> IMHO there have to be _very_ good reasons to commit something that you 
> plan to fix later anyway.

Code copying can be bad if one copies bugs.  But code copying allows a
strange feat: new code can inherit maturity.  If you copy known good
code and then change it in trivial ways (keeping the structure etc.) to
make it do new things, then the chance of a bug creeping in is lower
than if you wrote that piece of code anew.

> One such good reason would be that it is too hard to do in one go.  
> Another good reason would be that you think the fix is not even needed 
> (like I did when I wrote format: in the first place; I am quite surprised 
> that after _that_ long a time people complain -- I'd have expected 
> complaints right away or never).

Not everybody is as fast as you, Dscho. ;-)

Another idea that I was kicking around, but didn't get time to
implement: a performance regression test suite, i.e. make test for
timings and memory usages.

> In this case, I see no reason why we should go for suboptimal code first.
> 
> But hey, if you do not want to do it, I'll do it.  Just say so.

Busted again!  I wanted to see if someone else would pick up the
janitorial work for me. :-)

In any case, interpolate.c needs some attention, with or without my
patch.  I agree that a native strbuf version would be nice.  How about
an interface like that:

	typedef const char *(*expand_fn_t)
		(const char *placeholder, void *context);
	void strbuf_addexpand(struct strbuf *sb, const char *format,
	                      const char **placeholders,
	                      expand_fn_t fn, void *context);

strbuf_addexpand() would call fn() when it recognizes a placeholder,
avoiding unneeded setup code.  It could cache the result, so that fn()
gets called at most a single time for each given placeholder.  context
would be passed through to fn(), e.g. a struct commit in case of
format_commit_message().  Makes sense?

Thanks,
René

^ permalink raw reply

* Re: [RFC PATCH] Make gitk use --early-output
From: Linus Torvalds @ 2007-11-04 17:53 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Paul Mackerras, git
In-Reply-To: <e5bfff550711040237s250bcec0iddf1ebdc616e0bbf@mail.gmail.com>

On Sun, 4 Nov 2007, Marco Costalba wrote:
> 
> But --early-output does not imply --topo-order, I guess...

Well, it does right now, because I imagined that the primary users would 
always want the topological sort.

However, I have to admit that --early-output *could* be used even without 
the topological sort, because it also works for other cases that require 
up-front limiter logic - things like ranges of commits also have to be 
fully evaluated before they are totally certain, so I could imagine seeing 
some visualizer some day that doesn't need the topo-order sort, but does 
want to get a "preliminary" list.

That said, it does seem unlikely. Anybody who asks for --early-output is 
pretty much invariably going to be an interactive visulizer: the whole 
notion doesn't make much sense otherwise. So I think I made the right 
choice in making --early-output imply topo-order, and if somebody ever 
wants to not get the output topologically sorted (unlikely), we could add 
a "--no-topo-order" flag.

Side note: if you want the "--date-order", you do need to specify *both* 
--early-output and --date-order, and it will do the right thing (ie both 
the preliminary output and the final one will be topologically sorted, but 
within that topo-sort it will be in date order rather than clumped by 
the "shape" of the history).

			Linus

^ permalink raw reply

* Re: [REPLACEMENT PATCH 2/2] Add "--early-output" log flag for interactive GUI use
From: Linus Torvalds @ 2007-11-04 18:11 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0711032234030.15101@woody.linux-foundation.org>

On Sat, 3 Nov 2007, Linus Torvalds wrote:
> > 
> > How hard would it be to put the total number of commits on that "Final
> > output" line?  That would be useful for me.
> 
> Not hard. I think we basically have it anyway.

Actually, I take that back.

It's hard. Not because we don't have the commits, but because while we do 
the top-level shape pruning in the eearly stages, we do *not* do the final 
path-limiting until we actually output the commits.

Which actually makes "--early-output" right now do some rather odd things 
when you use a path limiter: we don't do the "rewrite_parents()" thing 
until later, so the early output will have done the first level of history 
simplification, but it won't have made history *dense* yet.

I'm looking at it now, I'll have to think about this a bit more. It might 
be trivial to fix, but this thing has real potential for being subtle.

			Linus

^ permalink raw reply

* Re: [PATCH qgit] Add support for --early-output option of git log command
From: Marco Costalba @ 2007-11-04 18:15 UTC (permalink / raw)
  To: Michael J. Cohen; +Cc: Git Mailing List
In-Reply-To: <34C93069-06F8-44DA-A18F-EE36BB457ABC@mac.com>

On 11/4/07, Michael J. Cohen <michaeljosephcohen@mac.com> wrote:
> On Nov 4, 2007, at 5:25 AM, Marco Costalba wrote:
>
> >       bool populateRenamedPatches(SCRef sha, SCList nn, FileHistory* fh,
> > QStringList* on, bool bt);
>
> **** malformed patch at line 137: QStringList* on, bool bt);
>
> looks like it was wrapped...
>

Sorry, it's a problem with gmail, please tell me if you want me to
resend as attachment or you fix the patch yourself.

Marco

^ permalink raw reply

* Re: [RFC PATCH] Make gitk use --early-output
From: David Kastrup @ 2007-11-04 18:28 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, git
In-Reply-To: <18221.2285.259487.655684@cargo.ozlabs.ibm.com>

Paul Mackerras <paulus@samba.org> writes:

> This makes gitk use the --early-output flag on the git log command.
>
> When gitk sees the "Final output:" line from git log, it goes into a
> mode where it basically just checks that it is getting the commits
> again in the same order as before.  If they are, well and good; if
> not, it truncates its internal list at the point of difference and
> proceeds to read in the commits in the new order from there on, and
> re-does the graph layout if necessary.
>
> This gives a much more immediate feel to the startup; gitk shows its
> window with the first screenful of commits displayed very quickly this
> way.

This is not strictly related with the patch: would it be possible to let
gitk just stall reading from git-rev-list if it has rendered enough
content on-screen?  The behavior I have with gitk on enormous
repositories now is that it starts up reasonably fast and nice and then
proceeds to suck up all memory in the background.

Particularly annoying is that closing its window appears to work, but
wish will still proceed sucking up all the pending git-rev-list output
and allocating memory for it before it will actually exit.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: [StGit RFC] A more structured way of calling git
From: Karl Hasselström @ 2007-11-04 18:34 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: David Kågedal, Git Mailing List, Yann Dirson
In-Reply-To: <b0943d9e0711030356j4dcd31cbl54d838107240b3d0@mail.gmail.com>

On 2007-11-03 10:56:36 +0000, Catalin Marinas wrote:

> On 26/10/2007, Karl Hasselström <kha@treskal.com> wrote:
>
> > I wanted to build an StGit command that coalesced adjacent patches
> > to a single patch. Because the end result tree would still be the
> > same, this should be doable without ever involving HEAD, the
> > index, or the worktree.
>
> Wouldn't HEAD need to be modified since the commit log changes
> slightly, even though the tree is the same. Or am I misunderstanding
> this?

I'm refering to the HEAD tree. The HEAD commit will of course change.

> > StGit's existing infrastructure for manipulating patches didn't
> > lend itself to doing this kind of thing, though: it's not modular
> > enough. So I started to design a replacement low-level interface
> > to git, and things got slightly out of hand ... and I ended up
> > with a much bigger refactoring than I'd planned.
>
> Thanks for this. I'll need a bit of time to read it all and give
> feedback. In general, I welcome this refactoring.
>
> I'll go through the whole e-mail in the next days and get back to
> you.

Thanks, I appreciate it.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Re: [StGit RFC] A more structured way of calling git
From: Karl Hasselström @ 2007-11-04 18:40 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Catalin Marinas, David Kågedal, Git Mailing List
In-Reply-To: <20071103142851.GG26436@nan92-1-81-57-214-146.fbx.proxad.net>

On 2007-11-03 15:28:51 +0100, Yann Dirson wrote:

> This reminds me of someone suggesting that some patches could be
> represented by more than one commit.

You might be remebering me pointing out that the old infrastructure
supported (or at least not directly disallowed) this.

> But I'm not sure such a beast would be useful - I fear that would
> make StGIT much more complicated, but would it really make things
> better?

Yes, it makes everything much more complicated, and no, it doesn't buy
us anything new. After all, once we know the parent and the tree we
want a patch to have, we can just manufacture a commit that has that
tree and that parent.

My proposed new infrastructure cannot represent such patches, very
much by design.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* [PATCH] Add more tests for git-clean
From: Shawn Bohrer @ 2007-11-04 19:02 UTC (permalink / raw)
  To: git; +Cc: gitster, Shawn Bohrer
In-Reply-To: <1194202941253-git-send-email-shawn.bohrer@gmail.com>

Signed-off-by: Shawn Bohrer <shawn.bohrer@gmail.com>
---
 t/t7300-clean.sh |  109 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 8697213..d74c11c 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -39,6 +39,97 @@ test_expect_success 'git-clean' '
 
 '
 
+test_expect_success 'git-clean src/' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	git-clean src/ &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
+test_expect_success 'git-clean src/ src/' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	git-clean src/ src/ &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
+test_expect_success 'git-clean with prefix' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	cd src/ &&
+	git-clean &&
+	cd - &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+test_expect_success 'git-clean -d with prefix and path' '
+
+	mkdir -p build docs src/feature &&
+	touch a.out src/part3.c src/feature/file.c docs/manual.txt obj.o build/lib.so &&
+	cd src/ &&
+	git-clean -d feature/ &&
+	cd - &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test -f src/part3.c &&
+	test ! -f src/feature/file.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
+test_expect_success 'git-clean symbolic link' '
+
+	mkdir -p build docs &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so &&
+	ln -s docs/manual.txt src/part4.c
+	git-clean &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test ! -f a.out &&
+	test ! -f src/part3.c &&
+	test ! -f src/part4.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
 test_expect_success 'git-clean -n' '
 
 	mkdir -p build docs &&
@@ -73,6 +164,24 @@ test_expect_success 'git-clean -d' '
 
 '
 
+test_expect_success 'git-clean -d src/ examples/' '
+
+	mkdir -p build docs examples &&
+	touch a.out src/part3.c docs/manual.txt obj.o build/lib.so examples/1.c &&
+	git-clean -d src/ examples/ &&
+	test -f Makefile &&
+	test -f README &&
+	test -f src/part1.c &&
+	test -f src/part2.c &&
+	test -f a.out &&
+	test ! -f src/part3.c &&
+	test ! -f examples/1.c &&
+	test -f docs/manual.txt &&
+	test -f obj.o &&
+	test -f build/lib.so
+
+'
+
 test_expect_success 'git-clean -x' '
 
 	mkdir -p build docs &&
-- 
1.5.3.GIT

^ permalink raw reply related

* [RFC] Second attempt at making git-clean a builtin
From: Shawn Bohrer @ 2007-11-04 19:02 UTC (permalink / raw)
  To: git; +Cc: gitster

I've taken all of the comments I received from my previous attempt see:

http://marc.info/?l=git&m=119181975419521&w=2

With these new changes in place my new git-clean passes all of the
original tests as well as the new tests I've added.  While looking at
how git-ls-files walks the tree there were some things that didn't quite
understand, or thought might be unnecessary so there may be some things I
missed.  For example I'm still not quite sure what verify_pathspec()
does.

I did however notice what I would call a bug in the behavior of
git-ls-files and therefore the current git-clean.sh.  With the current
git-clean if you have two directories that contain only untracked files,
for example docs/ and examples/ running:

git clean docs/ examples/

will not remove either directory.  Instead you must use the -d
parameter.  To me this makes sense, however if you run:

git clean docs/

it will remove the docs directory without using the -d parameter.  My
patch is at least consistent in that it requires the -d in both cases.

^ permalink raw reply

* [PATCH] Make git-clean a builtin
From: Shawn Bohrer @ 2007-11-04 19:02 UTC (permalink / raw)
  To: git; +Cc: gitster, Shawn Bohrer
In-Reply-To: <11942029442710-git-send-email-shawn.bohrer@gmail.com>

This replaces git-clean.sh with builtin-clean.c, and moves git-clean.sh to
the examples.

Signed-off-by: Shawn Bohrer <shawn.bohrer@gmail.com>
---
 Makefile                                      |    3 +-
 builtin-clean.c                               |  157 +++++++++++++++++++++++++
 builtin.h                                     |    1 +
 git-clean.sh => contrib/examples/git-clean.sh |    0 
 git.c                                         |    1 +
 5 files changed, 161 insertions(+), 1 deletions(-)
 create mode 100644 builtin-clean.c
 rename git-clean.sh => contrib/examples/git-clean.sh (100%)

diff --git a/Makefile b/Makefile
index 3ec1876..fad49b2 100644
--- a/Makefile
+++ b/Makefile
@@ -209,7 +209,7 @@ BASIC_LDFLAGS =
 
 SCRIPT_SH = \
 	git-bisect.sh git-checkout.sh \
-	git-clean.sh git-clone.sh git-commit.sh \
+	git-clone.sh git-commit.sh \
 	git-ls-remote.sh \
 	git-merge-one-file.sh git-mergetool.sh git-parse-remote.sh \
 	git-pull.sh git-rebase.sh git-rebase--interactive.sh \
@@ -326,6 +326,7 @@ BUILTIN_OBJS = \
 	builtin-check-attr.o \
 	builtin-checkout-index.o \
 	builtin-check-ref-format.o \
+	builtin-clean.o \
 	builtin-commit-tree.o \
 	builtin-count-objects.o \
 	builtin-describe.o \
diff --git a/builtin-clean.c b/builtin-clean.c
new file mode 100644
index 0000000..4141eb4
--- /dev/null
+++ b/builtin-clean.c
@@ -0,0 +1,157 @@
+/*
+ * "git clean" builtin command
+ *
+ * Copyright (C) 2007 Shawn Bohrer
+ *
+ * Based on git-clean.sh by Pavel Roskin
+ */
+
+#include "builtin.h"
+#include "cache.h"
+#include "dir.h"
+
+static int disabled = 1;
+static int show_only = 0;
+static int remove_directories = 0;
+static int quiet = 0;
+static int ignored = 0;
+static int ignored_only = 0;
+
+static const char builtin_clean_usage[] =
+"git-clean [-d] [-f] [-n] [-q] [-x | -X] [--] <paths>...";
+
+static int git_clean_config(const char *var, const char *value)
+{
+	if (!strcmp(var, "clean.requireforce")) {
+		disabled = git_config_bool(var, value);
+	}
+	return 0;
+}
+
+int cmd_clean(int argc, const char **argv, const char *prefix)
+{
+	int i, j;
+	struct strbuf directory;
+	struct dir_struct dir;
+	const char *path = ".";
+	const char *base = "";
+	int baselen = 0;
+	static const char **pathspec;
+
+	memset(&dir, 0, sizeof(dir));
+	git_config(git_clean_config);
+
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+
+		if (arg[0] != '-')
+			break;
+		if (!strcmp(arg, "--")) {
+			i++;
+			break;
+		}
+		if (!strcmp(arg, "-n")) {
+			show_only = 1;
+			disabled = 0;
+			continue;
+		}
+		if (!strcmp(arg, "-f")) {
+			disabled = 0;
+			continue;
+		}
+		if (!strcmp(arg, "-d")) {
+			remove_directories = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-q")) {
+			quiet = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-x")) {
+			ignored = 1;
+			continue;
+		}
+		if (!strcmp(arg, "-X")) {
+			ignored_only = 1;
+			dir.show_ignored =1;
+			dir.exclude_per_dir = ".gitignore";
+			continue;
+		}
+		usage(builtin_clean_usage);
+	}
+
+	if (ignored && ignored_only)
+		die("-x and -X cannot be used together");
+
+	if (disabled)
+		die("clean.requireForce set and -n or -f not given; refusing to clean");
+
+	dir.show_other_directories = 1;
+
+	if (!ignored) {
+		dir.exclude_per_dir = ".gitignore";
+		if (!access(git_path("info/exclude"), F_OK)) {
+			char *exclude_path = git_path("info/exclude");
+			add_excludes_from_file(&dir, exclude_path);
+		}
+	}
+
+	pathspec = get_pathspec(prefix, argv + i);
+	read_cache();
+	read_directory(&dir, path, base, baselen, pathspec);
+	strbuf_init(&directory, 0);
+
+	for (j = 0; j < dir.nr; ++j) {
+		struct dir_entry *ent = dir.entries[j];
+		int len, pos;
+		struct cache_entry *ce;
+		struct stat st;
+
+		/*
+		 * Remove the '/' at the end that directory
+		 * walking adds for directory entries.
+		 */
+		len = ent->len;
+		if (len && ent->name[len-1] == '/')
+			len--;
+		pos = cache_name_pos(ent->name, len);
+		if (0 <= pos)
+			continue;	/* exact match */
+		pos = -pos - 1;
+		if (pos < active_nr) {
+			ce = active_cache[pos];
+			if (ce_namelen(ce) == len &&
+			    !memcmp(ce->name, ent->name, len))
+				continue; /* Yup, this one exists unmerged */
+		}
+
+		/* remove the files */
+		if (!lstat(ent->name, &st) && (S_ISDIR(st.st_mode))) {
+			strbuf_addstr(&directory, ent->name);
+			if (show_only && remove_directories) {
+				printf("Would remove %s\n", directory.buf);
+			} else if (quiet && remove_directories) {
+				remove_dir_recursively(&directory, 0);
+			} else if (remove_directories) {
+				printf("Removing %s\n", ent->name);
+				remove_dir_recursively(&directory, 0);
+			} else if (show_only) {
+				printf("Would not remove %s\n", directory.buf);
+			} else {
+				printf("Not removing %s\n", directory.buf);
+			}
+			strbuf_reset(&directory);
+		} else {
+			if (show_only) {
+				printf("Would remove %s\n", ent->name);
+				continue;
+			} else if (!quiet) {
+				printf("Removing %s\n", ent->name);
+			}
+			unlink(ent->name);
+		}
+	}
+
+	strbuf_release(&directory);
+	return 0;
+}
diff --git a/builtin.h b/builtin.h
index 2335c01..0cbd685 100644
--- a/builtin.h
+++ b/builtin.h
@@ -24,6 +24,7 @@ extern int cmd_check_attr(int argc, const char **argv, const char *prefix);
 extern int cmd_check_ref_format(int argc, const char **argv, const char *prefix);
 extern int cmd_cherry(int argc, const char **argv, const char *prefix);
 extern int cmd_cherry_pick(int argc, const char **argv, const char *prefix);
+extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
 extern int cmd_describe(int argc, const char **argv, const char *prefix);
diff --git a/git-clean.sh b/contrib/examples/git-clean.sh
similarity index 100%
rename from git-clean.sh
rename to contrib/examples/git-clean.sh
diff --git a/git.c b/git.c
index 19a2172..30b7c22 100644
--- a/git.c
+++ b/git.c
@@ -298,6 +298,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "check-attr", cmd_check_attr, RUN_SETUP | NEED_WORK_TREE },
 		{ "cherry", cmd_cherry, RUN_SETUP },
 		{ "cherry-pick", cmd_cherry_pick, RUN_SETUP | NEED_WORK_TREE },
+		{ "clean", cmd_clean, RUN_SETUP | NEED_WORK_TREE },
 		{ "commit-tree", cmd_commit_tree, RUN_SETUP },
 		{ "config", cmd_config },
 		{ "count-objects", cmd_count_objects, RUN_SETUP },
-- 
1.5.3.GIT

^ permalink raw reply related

* [PATCH 0/3] Make user formatted commit listing less expensive
From: Johannes Schindelin @ 2007-11-04 19:14 UTC (permalink / raw)
  To: git, Rene Scharfe, gitster

Hi,

this series of three splits off the formatting code from commit.c, adds 
the function interp_find_active() to interpolate.[ch], and then uses it in 
the obvious way.

Ciao,
Dscho

^ permalink raw reply

* [PATCH 1/3] Split off the pretty print stuff into its own file
From: Johannes Schindelin @ 2007-11-04 19:15 UTC (permalink / raw)
  To: git, Rene Scharfe, gitster
In-Reply-To: <Pine.LNX.4.64.0711041912190.4362@racer.site>


The file commit.c got quite large, but it does not have to be: the
code concerning pretty printing is pretty well contained.  In fact,
this commit just splits it off into pretty.c, leaving commit.c with
just 672 lines.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	I know, I suggested using "format-patch -C -C" for this case, but
	the response was correct in that it seems funny.

	AFAICT this is a verbatim move of two hunks from commit.c to
	pretty.c, and the usual #include mantra in front of the latter to 
	make it	compile.

 Makefile |    2 +-
 commit.c |  718 -------------------------------------------------------------
 pretty.c |  723 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 724 insertions(+), 719 deletions(-)
 create mode 100644 pretty.c

diff --git a/Makefile b/Makefile
index 19a48f5..4c5f864 100644
--- a/Makefile
+++ b/Makefile
@@ -299,7 +299,7 @@ DIFF_OBJS = \
 LIB_OBJS = \
 	blob.o commit.o connect.o csum-file.o cache-tree.o base85.o \
 	date.o diff-delta.o entry.o exec_cmd.o ident.o \
-	interpolate.o hash.o \
+	pretty.o interpolate.o hash.o \
 	lockfile.o \
 	patch-ids.o \
 	object.o pack-check.o pack-write.o patch-delta.o path.o pkt-line.o \
diff --git a/commit.c b/commit.c
index ab4eb8b..f074811 100644
--- a/commit.c
+++ b/commit.c
@@ -3,7 +3,6 @@
 #include "commit.h"
 #include "pkt-line.h"
 #include "utf8.h"
-#include "interpolate.h"
 #include "diff.h"
 #include "revision.h"
 
@@ -11,46 +10,6 @@ int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
 
-static struct cmt_fmt_map {
-	const char *n;
-	size_t cmp_len;
-	enum cmit_fmt v;
-} cmt_fmts[] = {
-	{ "raw",	1,	CMIT_FMT_RAW },
-	{ "medium",	1,	CMIT_FMT_MEDIUM },
-	{ "short",	1,	CMIT_FMT_SHORT },
-	{ "email",	1,	CMIT_FMT_EMAIL },
-	{ "full",	5,	CMIT_FMT_FULL },
-	{ "fuller",	5,	CMIT_FMT_FULLER },
-	{ "oneline",	1,	CMIT_FMT_ONELINE },
-	{ "format:",	7,	CMIT_FMT_USERFORMAT},
-};
-
-static char *user_format;
-
-enum cmit_fmt get_commit_format(const char *arg)
-{
-	int i;
-
-	if (!arg || !*arg)
-		return CMIT_FMT_DEFAULT;
-	if (*arg == '=')
-		arg++;
-	if (!prefixcmp(arg, "format:")) {
-		if (user_format)
-			free(user_format);
-		user_format = xstrdup(arg + 7);
-		return CMIT_FMT_USERFORMAT;
-	}
-	for (i = 0; i < ARRAY_SIZE(cmt_fmts); i++) {
-		if (!strncmp(arg, cmt_fmts[i].n, cmt_fmts[i].cmp_len) &&
-		    !strncmp(arg, cmt_fmts[i].n, strlen(arg)))
-			return cmt_fmts[i].v;
-	}
-
-	die("invalid --pretty format: %s", arg);
-}
-
 static struct commit *check_commit(struct object *obj,
 				   const unsigned char *sha1,
 				   int quiet)
@@ -444,683 +403,6 @@ void clear_commit_marks(struct commit *commit, unsigned int mark)
 	}
 }
 
-/*
- * Generic support for pretty-printing the header
- */
-static int get_one_line(const char *msg)
-{
-	int ret = 0;
-
-	for (;;) {
-		char c = *msg++;
-		if (!c)
-			break;
-		ret++;
-		if (c == '\n')
-			break;
-	}
-	return ret;
-}
-
-/* High bit set, or ISO-2022-INT */
-int non_ascii(int ch)
-{
-	ch = (ch & 0xff);
-	return ((ch & 0x80) || (ch == 0x1b));
-}
-
-static int is_rfc2047_special(char ch)
-{
-	return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_'));
-}
-
-static void add_rfc2047(struct strbuf *sb, const char *line, int len,
-		       const char *encoding)
-{
-	int i, last;
-
-	for (i = 0; i < len; i++) {
-		int ch = line[i];
-		if (non_ascii(ch))
-			goto needquote;
-		if ((i + 1 < len) && (ch == '=' && line[i+1] == '?'))
-			goto needquote;
-	}
-	strbuf_add(sb, line, len);
-	return;
-
-needquote:
-	strbuf_grow(sb, len * 3 + strlen(encoding) + 100);
-	strbuf_addf(sb, "=?%s?q?", encoding);
-	for (i = last = 0; i < len; i++) {
-		unsigned ch = line[i] & 0xFF;
-		/*
-		 * We encode ' ' using '=20' even though rfc2047
-		 * allows using '_' for readability.  Unfortunately,
-		 * many programs do not understand this and just
-		 * leave the underscore in place.
-		 */
-		if (is_rfc2047_special(ch) || ch == ' ') {
-			strbuf_add(sb, line + last, i - last);
-			strbuf_addf(sb, "=%02X", ch);
-			last = i + 1;
-		}
-	}
-	strbuf_add(sb, line + last, len - last);
-	strbuf_addstr(sb, "?=");
-}
-
-static void add_user_info(const char *what, enum cmit_fmt fmt, struct strbuf *sb,
-			 const char *line, enum date_mode dmode,
-			 const char *encoding)
-{
-	char *date;
-	int namelen;
-	unsigned long time;
-	int tz;
-	const char *filler = "    ";
-
-	if (fmt == CMIT_FMT_ONELINE)
-		return;
-	date = strchr(line, '>');
-	if (!date)
-		return;
-	namelen = ++date - line;
-	time = strtoul(date, &date, 10);
-	tz = strtol(date, NULL, 10);
-
-	if (fmt == CMIT_FMT_EMAIL) {
-		char *name_tail = strchr(line, '<');
-		int display_name_length;
-		if (!name_tail)
-			return;
-		while (line < name_tail && isspace(name_tail[-1]))
-			name_tail--;
-		display_name_length = name_tail - line;
-		filler = "";
-		strbuf_addstr(sb, "From: ");
-		add_rfc2047(sb, line, display_name_length, encoding);
-		strbuf_add(sb, name_tail, namelen - display_name_length);
-		strbuf_addch(sb, '\n');
-	} else {
-		strbuf_addf(sb, "%s: %.*s%.*s\n", what,
-			      (fmt == CMIT_FMT_FULLER) ? 4 : 0,
-			      filler, namelen, line);
-	}
-	switch (fmt) {
-	case CMIT_FMT_MEDIUM:
-		strbuf_addf(sb, "Date:   %s\n", show_date(time, tz, dmode));
-		break;
-	case CMIT_FMT_EMAIL:
-		strbuf_addf(sb, "Date: %s\n", show_date(time, tz, DATE_RFC2822));
-		break;
-	case CMIT_FMT_FULLER:
-		strbuf_addf(sb, "%sDate: %s\n", what, show_date(time, tz, dmode));
-		break;
-	default:
-		/* notin' */
-		break;
-	}
-}
-
-static int is_empty_line(const char *line, int *len_p)
-{
-	int len = *len_p;
-	while (len && isspace(line[len-1]))
-		len--;
-	*len_p = len;
-	return !len;
-}
-
-static void add_merge_info(enum cmit_fmt fmt, struct strbuf *sb,
-			const struct commit *commit, int abbrev)
-{
-	struct commit_list *parent = commit->parents;
-
-	if ((fmt == CMIT_FMT_ONELINE) || (fmt == CMIT_FMT_EMAIL) ||
-	    !parent || !parent->next)
-		return;
-
-	strbuf_addstr(sb, "Merge:");
-
-	while (parent) {
-		struct commit *p = parent->item;
-		const char *hex = NULL;
-		const char *dots;
-		if (abbrev)
-			hex = find_unique_abbrev(p->object.sha1, abbrev);
-		if (!hex)
-			hex = sha1_to_hex(p->object.sha1);
-		dots = (abbrev && strlen(hex) != 40) ?  "..." : "";
-		parent = parent->next;
-
-		strbuf_addf(sb, " %s%s", hex, dots);
-	}
-	strbuf_addch(sb, '\n');
-}
-
-static char *get_header(const struct commit *commit, const char *key)
-{
-	int key_len = strlen(key);
-	const char *line = commit->buffer;
-
-	for (;;) {
-		const char *eol = strchr(line, '\n'), *next;
-
-		if (line == eol)
-			return NULL;
-		if (!eol) {
-			eol = line + strlen(line);
-			next = NULL;
-		} else
-			next = eol + 1;
-		if (eol - line > key_len &&
-		    !strncmp(line, key, key_len) &&
-		    line[key_len] == ' ') {
-			return xmemdupz(line + key_len + 1, eol - line - key_len - 1);
-		}
-		line = next;
-	}
-}
-
-static char *replace_encoding_header(char *buf, const char *encoding)
-{
-	struct strbuf tmp;
-	size_t start, len;
-	char *cp = buf;
-
-	/* guess if there is an encoding header before a \n\n */
-	while (strncmp(cp, "encoding ", strlen("encoding "))) {
-		cp = strchr(cp, '\n');
-		if (!cp || *++cp == '\n')
-			return buf;
-	}
-	start = cp - buf;
-	cp = strchr(cp, '\n');
-	if (!cp)
-		return buf; /* should not happen but be defensive */
-	len = cp + 1 - (buf + start);
-
-	strbuf_init(&tmp, 0);
-	strbuf_attach(&tmp, buf, strlen(buf), strlen(buf) + 1);
-	if (is_encoding_utf8(encoding)) {
-		/* we have re-coded to UTF-8; drop the header */
-		strbuf_remove(&tmp, start, len);
-	} else {
-		/* just replaces XXXX in 'encoding XXXX\n' */
-		strbuf_splice(&tmp, start + strlen("encoding "),
-					  len - strlen("encoding \n"),
-					  encoding, strlen(encoding));
-	}
-	return strbuf_detach(&tmp, NULL);
-}
-
-static char *logmsg_reencode(const struct commit *commit,
-			     const char *output_encoding)
-{
-	static const char *utf8 = "utf-8";
-	const char *use_encoding;
-	char *encoding;
-	char *out;
-
-	if (!*output_encoding)
-		return NULL;
-	encoding = get_header(commit, "encoding");
-	use_encoding = encoding ? encoding : utf8;
-	if (!strcmp(use_encoding, output_encoding))
-		if (encoding) /* we'll strip encoding header later */
-			out = xstrdup(commit->buffer);
-		else
-			return NULL; /* nothing to do */
-	else
-		out = reencode_string(commit->buffer,
-				      output_encoding, use_encoding);
-	if (out)
-		out = replace_encoding_header(out, output_encoding);
-
-	free(encoding);
-	return out;
-}
-
-static void fill_person(struct interp *table, const char *msg, int len)
-{
-	int start, end, tz = 0;
-	unsigned long date;
-	char *ep;
-
-	/* parse name */
-	for (end = 0; end < len && msg[end] != '<'; end++)
-		; /* do nothing */
-	start = end + 1;
-	while (end > 0 && isspace(msg[end - 1]))
-		end--;
-	table[0].value = xmemdupz(msg, end);
-
-	if (start >= len)
-		return;
-
-	/* parse email */
-	for (end = start + 1; end < len && msg[end] != '>'; end++)
-		; /* do nothing */
-
-	if (end >= len)
-		return;
-
-	table[1].value = xmemdupz(msg + start, end - start);
-
-	/* parse date */
-	for (start = end + 1; start < len && isspace(msg[start]); start++)
-		; /* do nothing */
-	if (start >= len)
-		return;
-	date = strtoul(msg + start, &ep, 10);
-	if (msg + start == ep)
-		return;
-
-	table[5].value = xmemdupz(msg + start, ep - (msg + start));
-
-	/* parse tz */
-	for (start = ep - msg + 1; start < len && isspace(msg[start]); start++)
-		; /* do nothing */
-	if (start + 1 < len) {
-		tz = strtoul(msg + start + 1, NULL, 10);
-		if (msg[start] == '-')
-			tz = -tz;
-	}
-
-	interp_set_entry(table, 2, show_date(date, tz, DATE_NORMAL));
-	interp_set_entry(table, 3, show_date(date, tz, DATE_RFC2822));
-	interp_set_entry(table, 4, show_date(date, tz, DATE_RELATIVE));
-	interp_set_entry(table, 6, show_date(date, tz, DATE_ISO8601));
-}
-
-void format_commit_message(const struct commit *commit,
-                           const void *format, struct strbuf *sb)
-{
-	struct interp table[] = {
-		{ "%H" },	/* commit hash */
-		{ "%h" },	/* abbreviated commit hash */
-		{ "%T" },	/* tree hash */
-		{ "%t" },	/* abbreviated tree hash */
-		{ "%P" },	/* parent hashes */
-		{ "%p" },	/* abbreviated parent hashes */
-		{ "%an" },	/* author name */
-		{ "%ae" },	/* author email */
-		{ "%ad" },	/* author date */
-		{ "%aD" },	/* author date, RFC2822 style */
-		{ "%ar" },	/* author date, relative */
-		{ "%at" },	/* author date, UNIX timestamp */
-		{ "%ai" },	/* author date, ISO 8601 */
-		{ "%cn" },	/* committer name */
-		{ "%ce" },	/* committer email */
-		{ "%cd" },	/* committer date */
-		{ "%cD" },	/* committer date, RFC2822 style */
-		{ "%cr" },	/* committer date, relative */
-		{ "%ct" },	/* committer date, UNIX timestamp */
-		{ "%ci" },	/* committer date, ISO 8601 */
-		{ "%e" },	/* encoding */
-		{ "%s" },	/* subject */
-		{ "%b" },	/* body */
-		{ "%Cred" },	/* red */
-		{ "%Cgreen" },	/* green */
-		{ "%Cblue" },	/* blue */
-		{ "%Creset" },	/* reset color */
-		{ "%n" },	/* newline */
-		{ "%m" },	/* left/right/bottom */
-	};
-	enum interp_index {
-		IHASH = 0, IHASH_ABBREV,
-		ITREE, ITREE_ABBREV,
-		IPARENTS, IPARENTS_ABBREV,
-		IAUTHOR_NAME, IAUTHOR_EMAIL,
-		IAUTHOR_DATE, IAUTHOR_DATE_RFC2822, IAUTHOR_DATE_RELATIVE,
-		IAUTHOR_TIMESTAMP, IAUTHOR_ISO8601,
-		ICOMMITTER_NAME, ICOMMITTER_EMAIL,
-		ICOMMITTER_DATE, ICOMMITTER_DATE_RFC2822,
-		ICOMMITTER_DATE_RELATIVE, ICOMMITTER_TIMESTAMP,
-		ICOMMITTER_ISO8601,
-		IENCODING,
-		ISUBJECT,
-		IBODY,
-		IRED, IGREEN, IBLUE, IRESET_COLOR,
-		INEWLINE,
-		ILEFT_RIGHT,
-	};
-	struct commit_list *p;
-	char parents[1024];
-	unsigned long len;
-	int i;
-	enum { HEADER, SUBJECT, BODY } state;
-	const char *msg = commit->buffer;
-
-	if (ILEFT_RIGHT + 1 != ARRAY_SIZE(table))
-		die("invalid interp table!");
-
-	/* these are independent of the commit */
-	interp_set_entry(table, IRED, "\033[31m");
-	interp_set_entry(table, IGREEN, "\033[32m");
-	interp_set_entry(table, IBLUE, "\033[34m");
-	interp_set_entry(table, IRESET_COLOR, "\033[m");
-	interp_set_entry(table, INEWLINE, "\n");
-
-	/* these depend on the commit */
-	if (!commit->object.parsed)
-		parse_object(commit->object.sha1);
-	interp_set_entry(table, IHASH, sha1_to_hex(commit->object.sha1));
-	interp_set_entry(table, IHASH_ABBREV,
-			find_unique_abbrev(commit->object.sha1,
-				DEFAULT_ABBREV));
-	interp_set_entry(table, ITREE, sha1_to_hex(commit->tree->object.sha1));
-	interp_set_entry(table, ITREE_ABBREV,
-			find_unique_abbrev(commit->tree->object.sha1,
-				DEFAULT_ABBREV));
-	interp_set_entry(table, ILEFT_RIGHT,
-			 (commit->object.flags & BOUNDARY)
-			 ? "-"
-			 : (commit->object.flags & SYMMETRIC_LEFT)
-			 ? "<"
-			 : ">");
-
-	parents[1] = 0;
-	for (i = 0, p = commit->parents;
-			p && i < sizeof(parents) - 1;
-			p = p->next)
-		i += snprintf(parents + i, sizeof(parents) - i - 1, " %s",
-			sha1_to_hex(p->item->object.sha1));
-	interp_set_entry(table, IPARENTS, parents + 1);
-
-	parents[1] = 0;
-	for (i = 0, p = commit->parents;
-			p && i < sizeof(parents) - 1;
-			p = p->next)
-		i += snprintf(parents + i, sizeof(parents) - i - 1, " %s",
-			find_unique_abbrev(p->item->object.sha1,
-				DEFAULT_ABBREV));
-	interp_set_entry(table, IPARENTS_ABBREV, parents + 1);
-
-	for (i = 0, state = HEADER; msg[i] && state < BODY; i++) {
-		int eol;
-		for (eol = i; msg[eol] && msg[eol] != '\n'; eol++)
-			; /* do nothing */
-
-		if (state == SUBJECT) {
-			table[ISUBJECT].value = xmemdupz(msg + i, eol - i);
-			i = eol;
-		}
-		if (i == eol) {
-			state++;
-			/* strip empty lines */
-			while (msg[eol + 1] == '\n')
-				eol++;
-		} else if (!prefixcmp(msg + i, "author "))
-			fill_person(table + IAUTHOR_NAME,
-					msg + i + 7, eol - i - 7);
-		else if (!prefixcmp(msg + i, "committer "))
-			fill_person(table + ICOMMITTER_NAME,
-					msg + i + 10, eol - i - 10);
-		else if (!prefixcmp(msg + i, "encoding "))
-			table[IENCODING].value =
-				xmemdupz(msg + i + 9, eol - i - 9);
-		i = eol;
-	}
-	if (msg[i])
-		table[IBODY].value = xstrdup(msg + i);
-
-	len = interpolate(sb->buf + sb->len, strbuf_avail(sb),
-				format, table, ARRAY_SIZE(table));
-	if (len > strbuf_avail(sb)) {
-		strbuf_grow(sb, len);
-		interpolate(sb->buf + sb->len, strbuf_avail(sb) + 1,
-					format, table, ARRAY_SIZE(table));
-	}
-	strbuf_setlen(sb, sb->len + len);
-	interp_clear_table(table, ARRAY_SIZE(table));
-}
-
-static void pp_header(enum cmit_fmt fmt,
-		      int abbrev,
-		      enum date_mode dmode,
-		      const char *encoding,
-		      const struct commit *commit,
-		      const char **msg_p,
-		      struct strbuf *sb)
-{
-	int parents_shown = 0;
-
-	for (;;) {
-		const char *line = *msg_p;
-		int linelen = get_one_line(*msg_p);
-
-		if (!linelen)
-			return;
-		*msg_p += linelen;
-
-		if (linelen == 1)
-			/* End of header */
-			return;
-
-		if (fmt == CMIT_FMT_RAW) {
-			strbuf_add(sb, line, linelen);
-			continue;
-		}
-
-		if (!memcmp(line, "parent ", 7)) {
-			if (linelen != 48)
-				die("bad parent line in commit");
-			continue;
-		}
-
-		if (!parents_shown) {
-			struct commit_list *parent;
-			int num;
-			for (parent = commit->parents, num = 0;
-			     parent;
-			     parent = parent->next, num++)
-				;
-			/* with enough slop */
-			strbuf_grow(sb, num * 50 + 20);
-			add_merge_info(fmt, sb, commit, abbrev);
-			parents_shown = 1;
-		}
-
-		/*
-		 * MEDIUM == DEFAULT shows only author with dates.
-		 * FULL shows both authors but not dates.
-		 * FULLER shows both authors and dates.
-		 */
-		if (!memcmp(line, "author ", 7)) {
-			strbuf_grow(sb, linelen + 80);
-			add_user_info("Author", fmt, sb, line + 7, dmode, encoding);
-		}
-		if (!memcmp(line, "committer ", 10) &&
-		    (fmt == CMIT_FMT_FULL || fmt == CMIT_FMT_FULLER)) {
-			strbuf_grow(sb, linelen + 80);
-			add_user_info("Commit", fmt, sb, line + 10, dmode, encoding);
-		}
-	}
-}
-
-static void pp_title_line(enum cmit_fmt fmt,
-			  const char **msg_p,
-			  struct strbuf *sb,
-			  const char *subject,
-			  const char *after_subject,
-			  const char *encoding,
-			  int plain_non_ascii)
-{
-	struct strbuf title;
-
-	strbuf_init(&title, 80);
-
-	for (;;) {
-		const char *line = *msg_p;
-		int linelen = get_one_line(line);
-
-		*msg_p += linelen;
-		if (!linelen || is_empty_line(line, &linelen))
-			break;
-
-		strbuf_grow(&title, linelen + 2);
-		if (title.len) {
-			if (fmt == CMIT_FMT_EMAIL) {
-				strbuf_addch(&title, '\n');
-			}
-			strbuf_addch(&title, ' ');
-		}
-		strbuf_add(&title, line, linelen);
-	}
-
-	strbuf_grow(sb, title.len + 1024);
-	if (subject) {
-		strbuf_addstr(sb, subject);
-		add_rfc2047(sb, title.buf, title.len, encoding);
-	} else {
-		strbuf_addbuf(sb, &title);
-	}
-	strbuf_addch(sb, '\n');
-
-	if (plain_non_ascii) {
-		const char *header_fmt =
-			"MIME-Version: 1.0\n"
-			"Content-Type: text/plain; charset=%s\n"
-			"Content-Transfer-Encoding: 8bit\n";
-		strbuf_addf(sb, header_fmt, encoding);
-	}
-	if (after_subject) {
-		strbuf_addstr(sb, after_subject);
-	}
-	if (fmt == CMIT_FMT_EMAIL) {
-		strbuf_addch(sb, '\n');
-	}
-	strbuf_release(&title);
-}
-
-static void pp_remainder(enum cmit_fmt fmt,
-			 const char **msg_p,
-			 struct strbuf *sb,
-			 int indent)
-{
-	int first = 1;
-	for (;;) {
-		const char *line = *msg_p;
-		int linelen = get_one_line(line);
-		*msg_p += linelen;
-
-		if (!linelen)
-			break;
-
-		if (is_empty_line(line, &linelen)) {
-			if (first)
-				continue;
-			if (fmt == CMIT_FMT_SHORT)
-				break;
-		}
-		first = 0;
-
-		strbuf_grow(sb, linelen + indent + 20);
-		if (indent) {
-			memset(sb->buf + sb->len, ' ', indent);
-			strbuf_setlen(sb, sb->len + indent);
-		}
-		strbuf_add(sb, line, linelen);
-		strbuf_addch(sb, '\n');
-	}
-}
-
-void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
-				  struct strbuf *sb, int abbrev,
-				  const char *subject, const char *after_subject,
-				  enum date_mode dmode, int plain_non_ascii)
-{
-	unsigned long beginning_of_body;
-	int indent = 4;
-	const char *msg = commit->buffer;
-	char *reencoded;
-	const char *encoding;
-
-	if (fmt == CMIT_FMT_USERFORMAT) {
-		format_commit_message(commit, user_format, sb);
-		return;
-	}
-
-	encoding = (git_log_output_encoding
-		    ? git_log_output_encoding
-		    : git_commit_encoding);
-	if (!encoding)
-		encoding = "utf-8";
-	reencoded = logmsg_reencode(commit, encoding);
-	if (reencoded) {
-		msg = reencoded;
-	}
-
-	if (fmt == CMIT_FMT_ONELINE || fmt == CMIT_FMT_EMAIL)
-		indent = 0;
-
-	/* After-subject is used to pass in Content-Type: multipart
-	 * MIME header; in that case we do not have to do the
-	 * plaintext content type even if the commit message has
-	 * non 7-bit ASCII character.  Otherwise, check if we need
-	 * to say this is not a 7-bit ASCII.
-	 */
-	if (fmt == CMIT_FMT_EMAIL && !after_subject) {
-		int i, ch, in_body;
-
-		for (in_body = i = 0; (ch = msg[i]); i++) {
-			if (!in_body) {
-				/* author could be non 7-bit ASCII but
-				 * the log may be so; skip over the
-				 * header part first.
-				 */
-				if (ch == '\n' && msg[i+1] == '\n')
-					in_body = 1;
-			}
-			else if (non_ascii(ch)) {
-				plain_non_ascii = 1;
-				break;
-			}
-		}
-	}
-
-	pp_header(fmt, abbrev, dmode, encoding, commit, &msg, sb);
-	if (fmt != CMIT_FMT_ONELINE && !subject) {
-		strbuf_addch(sb, '\n');
-	}
-
-	/* Skip excess blank lines at the beginning of body, if any... */
-	for (;;) {
-		int linelen = get_one_line(msg);
-		int ll = linelen;
-		if (!linelen)
-			break;
-		if (!is_empty_line(msg, &ll))
-			break;
-		msg += linelen;
-	}
-
-	/* These formats treat the title line specially. */
-	if (fmt == CMIT_FMT_ONELINE || fmt == CMIT_FMT_EMAIL)
-		pp_title_line(fmt, &msg, sb, subject,
-			      after_subject, encoding, plain_non_ascii);
-
-	beginning_of_body = sb->len;
-	if (fmt != CMIT_FMT_ONELINE)
-		pp_remainder(fmt, &msg, sb, indent);
-	strbuf_rtrim(sb);
-
-	/* Make sure there is an EOLN for the non-oneline case */
-	if (fmt != CMIT_FMT_ONELINE)
-		strbuf_addch(sb, '\n');
-
-	/*
-	 * The caller may append additional body text in e-mail
-	 * format.  Make sure we did not strip the blank line
-	 * between the header and the body.
-	 */
-	if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
-		strbuf_addch(sb, '\n');
-	free(reencoded);
-}
-
 struct commit *pop_commit(struct commit_list **stack)
 {
 	struct commit_list *top = *stack;
diff --git a/pretty.c b/pretty.c
new file mode 100644
index 0000000..490cede
--- /dev/null
+++ b/pretty.c
@@ -0,0 +1,723 @@
+#include "cache.h"
+#include "commit.h"
+#include "interpolate.h"
+#include "utf8.h"
+#include "diff.h"
+#include "revision.h"
+
+static struct cmt_fmt_map {
+	const char *n;
+	size_t cmp_len;
+	enum cmit_fmt v;
+} cmt_fmts[] = {
+	{ "raw",	1,	CMIT_FMT_RAW },
+	{ "medium",	1,	CMIT_FMT_MEDIUM },
+	{ "short",	1,	CMIT_FMT_SHORT },
+	{ "email",	1,	CMIT_FMT_EMAIL },
+	{ "full",	5,	CMIT_FMT_FULL },
+	{ "fuller",	5,	CMIT_FMT_FULLER },
+	{ "oneline",	1,	CMIT_FMT_ONELINE },
+	{ "format:",	7,	CMIT_FMT_USERFORMAT},
+};
+
+static char *user_format;
+
+enum cmit_fmt get_commit_format(const char *arg)
+{
+	int i;
+
+	if (!arg || !*arg)
+		return CMIT_FMT_DEFAULT;
+	if (*arg == '=')
+		arg++;
+	if (!prefixcmp(arg, "format:")) {
+		if (user_format)
+			free(user_format);
+		user_format = xstrdup(arg + 7);
+		return CMIT_FMT_USERFORMAT;
+	}
+	for (i = 0; i < ARRAY_SIZE(cmt_fmts); i++) {
+		if (!strncmp(arg, cmt_fmts[i].n, cmt_fmts[i].cmp_len) &&
+		    !strncmp(arg, cmt_fmts[i].n, strlen(arg)))
+			return cmt_fmts[i].v;
+	}
+
+	die("invalid --pretty format: %s", arg);
+}
+
+/*
+ * Generic support for pretty-printing the header
+ */
+static int get_one_line(const char *msg)
+{
+	int ret = 0;
+
+	for (;;) {
+		char c = *msg++;
+		if (!c)
+			break;
+		ret++;
+		if (c == '\n')
+			break;
+	}
+	return ret;
+}
+
+/* High bit set, or ISO-2022-INT */
+int non_ascii(int ch)
+{
+	ch = (ch & 0xff);
+	return ((ch & 0x80) || (ch == 0x1b));
+}
+
+static int is_rfc2047_special(char ch)
+{
+	return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_'));
+}
+
+static void add_rfc2047(struct strbuf *sb, const char *line, int len,
+		       const char *encoding)
+{
+	int i, last;
+
+	for (i = 0; i < len; i++) {
+		int ch = line[i];
+		if (non_ascii(ch))
+			goto needquote;
+		if ((i + 1 < len) && (ch == '=' && line[i+1] == '?'))
+			goto needquote;
+	}
+	strbuf_add(sb, line, len);
+	return;
+
+needquote:
+	strbuf_grow(sb, len * 3 + strlen(encoding) + 100);
+	strbuf_addf(sb, "=?%s?q?", encoding);
+	for (i = last = 0; i < len; i++) {
+		unsigned ch = line[i] & 0xFF;
+		/*
+		 * We encode ' ' using '=20' even though rfc2047
+		 * allows using '_' for readability.  Unfortunately,
+		 * many programs do not understand this and just
+		 * leave the underscore in place.
+		 */
+		if (is_rfc2047_special(ch) || ch == ' ') {
+			strbuf_add(sb, line + last, i - last);
+			strbuf_addf(sb, "=%02X", ch);
+			last = i + 1;
+		}
+	}
+	strbuf_add(sb, line + last, len - last);
+	strbuf_addstr(sb, "?=");
+}
+
+static void add_user_info(const char *what, enum cmit_fmt fmt, struct strbuf *sb,
+			 const char *line, enum date_mode dmode,
+			 const char *encoding)
+{
+	char *date;
+	int namelen;
+	unsigned long time;
+	int tz;
+	const char *filler = "    ";
+
+	if (fmt == CMIT_FMT_ONELINE)
+		return;
+	date = strchr(line, '>');
+	if (!date)
+		return;
+	namelen = ++date - line;
+	time = strtoul(date, &date, 10);
+	tz = strtol(date, NULL, 10);
+
+	if (fmt == CMIT_FMT_EMAIL) {
+		char *name_tail = strchr(line, '<');
+		int display_name_length;
+		if (!name_tail)
+			return;
+		while (line < name_tail && isspace(name_tail[-1]))
+			name_tail--;
+		display_name_length = name_tail - line;
+		filler = "";
+		strbuf_addstr(sb, "From: ");
+		add_rfc2047(sb, line, display_name_length, encoding);
+		strbuf_add(sb, name_tail, namelen - display_name_length);
+		strbuf_addch(sb, '\n');
+	} else {
+		strbuf_addf(sb, "%s: %.*s%.*s\n", what,
+			      (fmt == CMIT_FMT_FULLER) ? 4 : 0,
+			      filler, namelen, line);
+	}
+	switch (fmt) {
+	case CMIT_FMT_MEDIUM:
+		strbuf_addf(sb, "Date:   %s\n", show_date(time, tz, dmode));
+		break;
+	case CMIT_FMT_EMAIL:
+		strbuf_addf(sb, "Date: %s\n", show_date(time, tz, DATE_RFC2822));
+		break;
+	case CMIT_FMT_FULLER:
+		strbuf_addf(sb, "%sDate: %s\n", what, show_date(time, tz, dmode));
+		break;
+	default:
+		/* notin' */
+		break;
+	}
+}
+
+static int is_empty_line(const char *line, int *len_p)
+{
+	int len = *len_p;
+	while (len && isspace(line[len-1]))
+		len--;
+	*len_p = len;
+	return !len;
+}
+
+static void add_merge_info(enum cmit_fmt fmt, struct strbuf *sb,
+			const struct commit *commit, int abbrev)
+{
+	struct commit_list *parent = commit->parents;
+
+	if ((fmt == CMIT_FMT_ONELINE) || (fmt == CMIT_FMT_EMAIL) ||
+	    !parent || !parent->next)
+		return;
+
+	strbuf_addstr(sb, "Merge:");
+
+	while (parent) {
+		struct commit *p = parent->item;
+		const char *hex = NULL;
+		const char *dots;
+		if (abbrev)
+			hex = find_unique_abbrev(p->object.sha1, abbrev);
+		if (!hex)
+			hex = sha1_to_hex(p->object.sha1);
+		dots = (abbrev && strlen(hex) != 40) ?  "..." : "";
+		parent = parent->next;
+
+		strbuf_addf(sb, " %s%s", hex, dots);
+	}
+	strbuf_addch(sb, '\n');
+}
+
+static char *get_header(const struct commit *commit, const char *key)
+{
+	int key_len = strlen(key);
+	const char *line = commit->buffer;
+
+	for (;;) {
+		const char *eol = strchr(line, '\n'), *next;
+
+		if (line == eol)
+			return NULL;
+		if (!eol) {
+			eol = line + strlen(line);
+			next = NULL;
+		} else
+			next = eol + 1;
+		if (eol - line > key_len &&
+		    !strncmp(line, key, key_len) &&
+		    line[key_len] == ' ') {
+			return xmemdupz(line + key_len + 1, eol - line - key_len - 1);
+		}
+		line = next;
+	}
+}
+
+static char *replace_encoding_header(char *buf, const char *encoding)
+{
+	struct strbuf tmp;
+	size_t start, len;
+	char *cp = buf;
+
+	/* guess if there is an encoding header before a \n\n */
+	while (strncmp(cp, "encoding ", strlen("encoding "))) {
+		cp = strchr(cp, '\n');
+		if (!cp || *++cp == '\n')
+			return buf;
+	}
+	start = cp - buf;
+	cp = strchr(cp, '\n');
+	if (!cp)
+		return buf; /* should not happen but be defensive */
+	len = cp + 1 - (buf + start);
+
+	strbuf_init(&tmp, 0);
+	strbuf_attach(&tmp, buf, strlen(buf), strlen(buf) + 1);
+	if (is_encoding_utf8(encoding)) {
+		/* we have re-coded to UTF-8; drop the header */
+		strbuf_remove(&tmp, start, len);
+	} else {
+		/* just replaces XXXX in 'encoding XXXX\n' */
+		strbuf_splice(&tmp, start + strlen("encoding "),
+					  len - strlen("encoding \n"),
+					  encoding, strlen(encoding));
+	}
+	return strbuf_detach(&tmp, NULL);
+}
+
+static char *logmsg_reencode(const struct commit *commit,
+			     const char *output_encoding)
+{
+	static const char *utf8 = "utf-8";
+	const char *use_encoding;
+	char *encoding;
+	char *out;
+
+	if (!*output_encoding)
+		return NULL;
+	encoding = get_header(commit, "encoding");
+	use_encoding = encoding ? encoding : utf8;
+	if (!strcmp(use_encoding, output_encoding))
+		if (encoding) /* we'll strip encoding header later */
+			out = xstrdup(commit->buffer);
+		else
+			return NULL; /* nothing to do */
+	else
+		out = reencode_string(commit->buffer,
+				      output_encoding, use_encoding);
+	if (out)
+		out = replace_encoding_header(out, output_encoding);
+
+	free(encoding);
+	return out;
+}
+
+static void fill_person(struct interp *table, const char *msg, int len)
+{
+	int start, end, tz = 0;
+	unsigned long date;
+	char *ep;
+
+	/* parse name */
+	for (end = 0; end < len && msg[end] != '<'; end++)
+		; /* do nothing */
+	start = end + 1;
+	while (end > 0 && isspace(msg[end - 1]))
+		end--;
+	table[0].value = xmemdupz(msg, end);
+
+	if (start >= len)
+		return;
+
+	/* parse email */
+	for (end = start + 1; end < len && msg[end] != '>'; end++)
+		; /* do nothing */
+
+	if (end >= len)
+		return;
+
+	table[1].value = xmemdupz(msg + start, end - start);
+
+	/* parse date */
+	for (start = end + 1; start < len && isspace(msg[start]); start++)
+		; /* do nothing */
+	if (start >= len)
+		return;
+	date = strtoul(msg + start, &ep, 10);
+	if (msg + start == ep)
+		return;
+
+	table[5].value = xmemdupz(msg + start, ep - (msg + start));
+
+	/* parse tz */
+	for (start = ep - msg + 1; start < len && isspace(msg[start]); start++)
+		; /* do nothing */
+	if (start + 1 < len) {
+		tz = strtoul(msg + start + 1, NULL, 10);
+		if (msg[start] == '-')
+			tz = -tz;
+	}
+
+	interp_set_entry(table, 2, show_date(date, tz, DATE_NORMAL));
+	interp_set_entry(table, 3, show_date(date, tz, DATE_RFC2822));
+	interp_set_entry(table, 4, show_date(date, tz, DATE_RELATIVE));
+	interp_set_entry(table, 6, show_date(date, tz, DATE_ISO8601));
+}
+
+void format_commit_message(const struct commit *commit,
+                           const void *format, struct strbuf *sb)
+{
+	struct interp table[] = {
+		{ "%H" },	/* commit hash */
+		{ "%h" },	/* abbreviated commit hash */
+		{ "%T" },	/* tree hash */
+		{ "%t" },	/* abbreviated tree hash */
+		{ "%P" },	/* parent hashes */
+		{ "%p" },	/* abbreviated parent hashes */
+		{ "%an" },	/* author name */
+		{ "%ae" },	/* author email */
+		{ "%ad" },	/* author date */
+		{ "%aD" },	/* author date, RFC2822 style */
+		{ "%ar" },	/* author date, relative */
+		{ "%at" },	/* author date, UNIX timestamp */
+		{ "%ai" },	/* author date, ISO 8601 */
+		{ "%cn" },	/* committer name */
+		{ "%ce" },	/* committer email */
+		{ "%cd" },	/* committer date */
+		{ "%cD" },	/* committer date, RFC2822 style */
+		{ "%cr" },	/* committer date, relative */
+		{ "%ct" },	/* committer date, UNIX timestamp */
+		{ "%ci" },	/* committer date, ISO 8601 */
+		{ "%e" },	/* encoding */
+		{ "%s" },	/* subject */
+		{ "%b" },	/* body */
+		{ "%Cred" },	/* red */
+		{ "%Cgreen" },	/* green */
+		{ "%Cblue" },	/* blue */
+		{ "%Creset" },	/* reset color */
+		{ "%n" },	/* newline */
+		{ "%m" },	/* left/right/bottom */
+	};
+	enum interp_index {
+		IHASH = 0, IHASH_ABBREV,
+		ITREE, ITREE_ABBREV,
+		IPARENTS, IPARENTS_ABBREV,
+		IAUTHOR_NAME, IAUTHOR_EMAIL,
+		IAUTHOR_DATE, IAUTHOR_DATE_RFC2822, IAUTHOR_DATE_RELATIVE,
+		IAUTHOR_TIMESTAMP, IAUTHOR_ISO8601,
+		ICOMMITTER_NAME, ICOMMITTER_EMAIL,
+		ICOMMITTER_DATE, ICOMMITTER_DATE_RFC2822,
+		ICOMMITTER_DATE_RELATIVE, ICOMMITTER_TIMESTAMP,
+		ICOMMITTER_ISO8601,
+		IENCODING,
+		ISUBJECT,
+		IBODY,
+		IRED, IGREEN, IBLUE, IRESET_COLOR,
+		INEWLINE,
+		ILEFT_RIGHT,
+	};
+	struct commit_list *p;
+	char parents[1024];
+	unsigned long len;
+	int i;
+	enum { HEADER, SUBJECT, BODY } state;
+	const char *msg = commit->buffer;
+
+	if (ILEFT_RIGHT + 1 != ARRAY_SIZE(table))
+		die("invalid interp table!");
+
+	/* these are independent of the commit */
+	interp_set_entry(table, IRED, "\033[31m");
+	interp_set_entry(table, IGREEN, "\033[32m");
+	interp_set_entry(table, IBLUE, "\033[34m");
+	interp_set_entry(table, IRESET_COLOR, "\033[m");
+	interp_set_entry(table, INEWLINE, "\n");
+
+	/* these depend on the commit */
+	if (!commit->object.parsed)
+		parse_object(commit->object.sha1);
+	interp_set_entry(table, IHASH, sha1_to_hex(commit->object.sha1));
+	interp_set_entry(table, IHASH_ABBREV,
+			find_unique_abbrev(commit->object.sha1,
+				DEFAULT_ABBREV));
+	interp_set_entry(table, ITREE, sha1_to_hex(commit->tree->object.sha1));
+	interp_set_entry(table, ITREE_ABBREV,
+			find_unique_abbrev(commit->tree->object.sha1,
+				DEFAULT_ABBREV));
+	interp_set_entry(table, ILEFT_RIGHT,
+			 (commit->object.flags & BOUNDARY)
+			 ? "-"
+			 : (commit->object.flags & SYMMETRIC_LEFT)
+			 ? "<"
+			 : ">");
+
+	parents[1] = 0;
+	for (i = 0, p = commit->parents;
+			p && i < sizeof(parents) - 1;
+			p = p->next)
+		i += snprintf(parents + i, sizeof(parents) - i - 1, " %s",
+			sha1_to_hex(p->item->object.sha1));
+	interp_set_entry(table, IPARENTS, parents + 1);
+
+	parents[1] = 0;
+	for (i = 0, p = commit->parents;
+			p && i < sizeof(parents) - 1;
+			p = p->next)
+		i += snprintf(parents + i, sizeof(parents) - i - 1, " %s",
+			find_unique_abbrev(p->item->object.sha1,
+				DEFAULT_ABBREV));
+	interp_set_entry(table, IPARENTS_ABBREV, parents + 1);
+
+	for (i = 0, state = HEADER; msg[i] && state < BODY; i++) {
+		int eol;
+		for (eol = i; msg[eol] && msg[eol] != '\n'; eol++)
+			; /* do nothing */
+
+		if (state == SUBJECT) {
+			table[ISUBJECT].value = xmemdupz(msg + i, eol - i);
+			i = eol;
+		}
+		if (i == eol) {
+			state++;
+			/* strip empty lines */
+			while (msg[eol + 1] == '\n')
+				eol++;
+		} else if (!prefixcmp(msg + i, "author "))
+			fill_person(table + IAUTHOR_NAME,
+					msg + i + 7, eol - i - 7);
+		else if (!prefixcmp(msg + i, "committer "))
+			fill_person(table + ICOMMITTER_NAME,
+					msg + i + 10, eol - i - 10);
+		else if (!prefixcmp(msg + i, "encoding "))
+			table[IENCODING].value =
+				xmemdupz(msg + i + 9, eol - i - 9);
+		i = eol;
+	}
+	if (msg[i])
+		table[IBODY].value = xstrdup(msg + i);
+
+	len = interpolate(sb->buf + sb->len, strbuf_avail(sb),
+				format, table, ARRAY_SIZE(table));
+	if (len > strbuf_avail(sb)) {
+		strbuf_grow(sb, len);
+		interpolate(sb->buf + sb->len, strbuf_avail(sb) + 1,
+					format, table, ARRAY_SIZE(table));
+	}
+	strbuf_setlen(sb, sb->len + len);
+	interp_clear_table(table, ARRAY_SIZE(table));
+}
+
+static void pp_header(enum cmit_fmt fmt,
+		      int abbrev,
+		      enum date_mode dmode,
+		      const char *encoding,
+		      const struct commit *commit,
+		      const char **msg_p,
+		      struct strbuf *sb)
+{
+	int parents_shown = 0;
+
+	for (;;) {
+		const char *line = *msg_p;
+		int linelen = get_one_line(*msg_p);
+
+		if (!linelen)
+			return;
+		*msg_p += linelen;
+
+		if (linelen == 1)
+			/* End of header */
+			return;
+
+		if (fmt == CMIT_FMT_RAW) {
+			strbuf_add(sb, line, linelen);
+			continue;
+		}
+
+		if (!memcmp(line, "parent ", 7)) {
+			if (linelen != 48)
+				die("bad parent line in commit");
+			continue;
+		}
+
+		if (!parents_shown) {
+			struct commit_list *parent;
+			int num;
+			for (parent = commit->parents, num = 0;
+			     parent;
+			     parent = parent->next, num++)
+				;
+			/* with enough slop */
+			strbuf_grow(sb, num * 50 + 20);
+			add_merge_info(fmt, sb, commit, abbrev);
+			parents_shown = 1;
+		}
+
+		/*
+		 * MEDIUM == DEFAULT shows only author with dates.
+		 * FULL shows both authors but not dates.
+		 * FULLER shows both authors and dates.
+		 */
+		if (!memcmp(line, "author ", 7)) {
+			strbuf_grow(sb, linelen + 80);
+			add_user_info("Author", fmt, sb, line + 7, dmode, encoding);
+		}
+		if (!memcmp(line, "committer ", 10) &&
+		    (fmt == CMIT_FMT_FULL || fmt == CMIT_FMT_FULLER)) {
+			strbuf_grow(sb, linelen + 80);
+			add_user_info("Commit", fmt, sb, line + 10, dmode, encoding);
+		}
+	}
+}
+
+static void pp_title_line(enum cmit_fmt fmt,
+			  const char **msg_p,
+			  struct strbuf *sb,
+			  const char *subject,
+			  const char *after_subject,
+			  const char *encoding,
+			  int plain_non_ascii)
+{
+	struct strbuf title;
+
+	strbuf_init(&title, 80);
+
+	for (;;) {
+		const char *line = *msg_p;
+		int linelen = get_one_line(line);
+
+		*msg_p += linelen;
+		if (!linelen || is_empty_line(line, &linelen))
+			break;
+
+		strbuf_grow(&title, linelen + 2);
+		if (title.len) {
+			if (fmt == CMIT_FMT_EMAIL) {
+				strbuf_addch(&title, '\n');
+			}
+			strbuf_addch(&title, ' ');
+		}
+		strbuf_add(&title, line, linelen);
+	}
+
+	strbuf_grow(sb, title.len + 1024);
+	if (subject) {
+		strbuf_addstr(sb, subject);
+		add_rfc2047(sb, title.buf, title.len, encoding);
+	} else {
+		strbuf_addbuf(sb, &title);
+	}
+	strbuf_addch(sb, '\n');
+
+	if (plain_non_ascii) {
+		const char *header_fmt =
+			"MIME-Version: 1.0\n"
+			"Content-Type: text/plain; charset=%s\n"
+			"Content-Transfer-Encoding: 8bit\n";
+		strbuf_addf(sb, header_fmt, encoding);
+	}
+	if (after_subject) {
+		strbuf_addstr(sb, after_subject);
+	}
+	if (fmt == CMIT_FMT_EMAIL) {
+		strbuf_addch(sb, '\n');
+	}
+	strbuf_release(&title);
+}
+
+static void pp_remainder(enum cmit_fmt fmt,
+			 const char **msg_p,
+			 struct strbuf *sb,
+			 int indent)
+{
+	int first = 1;
+	for (;;) {
+		const char *line = *msg_p;
+		int linelen = get_one_line(line);
+		*msg_p += linelen;
+
+		if (!linelen)
+			break;
+
+		if (is_empty_line(line, &linelen)) {
+			if (first)
+				continue;
+			if (fmt == CMIT_FMT_SHORT)
+				break;
+		}
+		first = 0;
+
+		strbuf_grow(sb, linelen + indent + 20);
+		if (indent) {
+			memset(sb->buf + sb->len, ' ', indent);
+			strbuf_setlen(sb, sb->len + indent);
+		}
+		strbuf_add(sb, line, linelen);
+		strbuf_addch(sb, '\n');
+	}
+}
+
+void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
+				  struct strbuf *sb, int abbrev,
+				  const char *subject, const char *after_subject,
+				  enum date_mode dmode, int plain_non_ascii)
+{
+	unsigned long beginning_of_body;
+	int indent = 4;
+	const char *msg = commit->buffer;
+	char *reencoded;
+	const char *encoding;
+
+	if (fmt == CMIT_FMT_USERFORMAT) {
+		format_commit_message(commit, user_format, sb);
+		return;
+	}
+
+	encoding = (git_log_output_encoding
+		    ? git_log_output_encoding
+		    : git_commit_encoding);
+	if (!encoding)
+		encoding = "utf-8";
+	reencoded = logmsg_reencode(commit, encoding);
+	if (reencoded) {
+		msg = reencoded;
+	}
+
+	if (fmt == CMIT_FMT_ONELINE || fmt == CMIT_FMT_EMAIL)
+		indent = 0;
+
+	/* After-subject is used to pass in Content-Type: multipart
+	 * MIME header; in that case we do not have to do the
+	 * plaintext content type even if the commit message has
+	 * non 7-bit ASCII character.  Otherwise, check if we need
+	 * to say this is not a 7-bit ASCII.
+	 */
+	if (fmt == CMIT_FMT_EMAIL && !after_subject) {
+		int i, ch, in_body;
+
+		for (in_body = i = 0; (ch = msg[i]); i++) {
+			if (!in_body) {
+				/* author could be non 7-bit ASCII but
+				 * the log may be so; skip over the
+				 * header part first.
+				 */
+				if (ch == '\n' && msg[i+1] == '\n')
+					in_body = 1;
+			}
+			else if (non_ascii(ch)) {
+				plain_non_ascii = 1;
+				break;
+			}
+		}
+	}
+
+	pp_header(fmt, abbrev, dmode, encoding, commit, &msg, sb);
+	if (fmt != CMIT_FMT_ONELINE && !subject) {
+		strbuf_addch(sb, '\n');
+	}
+
+	/* Skip excess blank lines at the beginning of body, if any... */
+	for (;;) {
+		int linelen = get_one_line(msg);
+		int ll = linelen;
+		if (!linelen)
+			break;
+		if (!is_empty_line(msg, &ll))
+			break;
+		msg += linelen;
+	}
+
+	/* These formats treat the title line specially. */
+	if (fmt == CMIT_FMT_ONELINE || fmt == CMIT_FMT_EMAIL)
+		pp_title_line(fmt, &msg, sb, subject,
+			      after_subject, encoding, plain_non_ascii);
+
+	beginning_of_body = sb->len;
+	if (fmt != CMIT_FMT_ONELINE)
+		pp_remainder(fmt, &msg, sb, indent);
+	strbuf_rtrim(sb);
+
+	/* Make sure there is an EOLN for the non-oneline case */
+	if (fmt != CMIT_FMT_ONELINE)
+		strbuf_addch(sb, '\n');
+
+	/*
+	 * The caller may append additional body text in e-mail
+	 * format.  Make sure we did not strip the blank line
+	 * between the header and the body.
+	 */
+	if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
+		strbuf_addch(sb, '\n');
+	free(reencoded);
+}
-- 
1.5.3.5.1549.g91a3

^ permalink raw reply related

* [PATCH 2/3] interpolate.[ch]: Add a function to find which interpolations are active.
From: Johannes Schindelin @ 2007-11-04 19:15 UTC (permalink / raw)
  To: git, Rene Scharfe, gitster
In-Reply-To: <Pine.LNX.4.64.0711041912190.4362@racer.site>


Some substitutions require pretty expensive operations.  So it make
sense to find out which are needed to begin with.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 interpolate.c |   20 ++++++++++++++++++++
 interpolate.h |    2 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/interpolate.c b/interpolate.c
index 6ef53f2..80eeb36 100644
--- a/interpolate.c
+++ b/interpolate.c
@@ -102,3 +102,23 @@ unsigned long interpolate(char *result, unsigned long reslen,
 		*dest = '\0';
 	return newlen;
 }
+
+char *interp_find_active(const char *orig,
+		const struct interp *interps, int ninterps)
+{
+	char *result = xcalloc(1, ninterps);
+	char c;
+	int i;
+
+	while ((c = *(orig++)))
+		if (c == '%')
+			/* Try to match an interpolation string. */
+			for (i = 0; i < ninterps; i++)
+				if (!prefixcmp(orig, interps[i].name + 1)) {
+					result[i] = 1;
+					orig += strlen(interps[i].name + 1);
+					break;
+				}
+
+	return result;
+}
diff --git a/interpolate.h b/interpolate.h
index 77407e6..2d197c5 100644
--- a/interpolate.h
+++ b/interpolate.h
@@ -22,5 +22,7 @@ extern void interp_clear_table(struct interp *table, int ninterps);
 extern unsigned long interpolate(char *result, unsigned long reslen,
 				 const char *orig,
 				 const struct interp *interps, int ninterps);
+extern char *interp_find_active(const char *orig,
+				const struct interp *interps, int ninterps);
 
 #endif /* INTERPOLATE_H */
-- 
1.5.3.5.1549.g91a3

^ permalink raw reply related

* [PATCH 3/3] pretty=format: Avoid some expensive calculations when not needed
From: Johannes Schindelin @ 2007-11-04 19:15 UTC (permalink / raw)
  To: git, Rene Scharfe, gitster
In-Reply-To: <Pine.LNX.4.64.0711041912190.4362@racer.site>


Use the new function interp_find_active() to avoid calculating the
unique hash names, and other things, when they are not even asked for.

Unfortunately, we cannot reuse the result of that function, which
would be cleaner: there are more users than just git log.  Most
notably, git-archive with "$Format:...$" substitution.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
	So I found another reason why the function has to be called
	everytime.  But this reason appeals to me much more.

	Originally, I wanted to do this differently, by providing a
	function which generates the substitutions, but the header
	parsing makes that infeasible.

 pretty.c |   55 ++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/pretty.c b/pretty.c
index 490cede..241e91c 100644
--- a/pretty.c
+++ b/pretty.c
@@ -393,6 +393,7 @@ void format_commit_message(const struct commit *commit,
 	int i;
 	enum { HEADER, SUBJECT, BODY } state;
 	const char *msg = commit->buffer;
+	char *active = interp_find_active(format, table, ARRAY_SIZE(table));
 
 	if (ILEFT_RIGHT + 1 != ARRAY_SIZE(table))
 		die("invalid interp table!");
@@ -407,12 +408,18 @@ void format_commit_message(const struct commit *commit,
 	/* these depend on the commit */
 	if (!commit->object.parsed)
 		parse_object(commit->object.sha1);
-	interp_set_entry(table, IHASH, sha1_to_hex(commit->object.sha1));
-	interp_set_entry(table, IHASH_ABBREV,
+	if (active[IHASH])
+		interp_set_entry(table, IHASH,
+				sha1_to_hex(commit->object.sha1));
+	if (active[IHASH_ABBREV])
+		interp_set_entry(table, IHASH_ABBREV,
 			find_unique_abbrev(commit->object.sha1,
 				DEFAULT_ABBREV));
-	interp_set_entry(table, ITREE, sha1_to_hex(commit->tree->object.sha1));
-	interp_set_entry(table, ITREE_ABBREV,
+	if (active[ITREE])
+		interp_set_entry(table, ITREE,
+				sha1_to_hex(commit->tree->object.sha1));
+	if (active[ITREE_ABBREV])
+		interp_set_entry(table, ITREE_ABBREV,
 			find_unique_abbrev(commit->tree->object.sha1,
 				DEFAULT_ABBREV));
 	interp_set_entry(table, ILEFT_RIGHT,
@@ -422,22 +429,27 @@ void format_commit_message(const struct commit *commit,
 			 ? "<"
 			 : ">");
 
-	parents[1] = 0;
-	for (i = 0, p = commit->parents;
-			p && i < sizeof(parents) - 1;
-			p = p->next)
-		i += snprintf(parents + i, sizeof(parents) - i - 1, " %s",
-			sha1_to_hex(p->item->object.sha1));
-	interp_set_entry(table, IPARENTS, parents + 1);
-
-	parents[1] = 0;
-	for (i = 0, p = commit->parents;
-			p && i < sizeof(parents) - 1;
-			p = p->next)
-		i += snprintf(parents + i, sizeof(parents) - i - 1, " %s",
-			find_unique_abbrev(p->item->object.sha1,
-				DEFAULT_ABBREV));
-	interp_set_entry(table, IPARENTS_ABBREV, parents + 1);
+	if (active[IPARENTS]) {
+		parents[1] = 0;
+		for (i = 0, p = commit->parents;
+				p && i < sizeof(parents) - 1;
+				p = p->next)
+			i += snprintf(parents + i, sizeof(parents) - i - 1,
+				" %s", sha1_to_hex(p->item->object.sha1));
+		interp_set_entry(table, IPARENTS, parents + 1);
+	}
+
+	if (active[IPARENTS_ABBREV]) {
+		parents[1] = 0;
+		for (i = 0, p = commit->parents;
+				p && i < sizeof(parents) - 1;
+				p = p->next)
+			i += snprintf(parents + i, sizeof(parents) - i - 1,
+				" %s",
+				find_unique_abbrev(p->item->object.sha1,
+					DEFAULT_ABBREV));
+		interp_set_entry(table, IPARENTS_ABBREV, parents + 1);
+	}
 
 	for (i = 0, state = HEADER; msg[i] && state < BODY; i++) {
 		int eol;
@@ -464,7 +476,7 @@ void format_commit_message(const struct commit *commit,
 				xmemdupz(msg + i + 9, eol - i - 9);
 		i = eol;
 	}
-	if (msg[i])
+	if (active[IBODY] && msg[i])
 		table[IBODY].value = xstrdup(msg + i);
 
 	len = interpolate(sb->buf + sb->len, strbuf_avail(sb),
@@ -476,6 +488,7 @@ void format_commit_message(const struct commit *commit,
 	}
 	strbuf_setlen(sb, sb->len + len);
 	interp_clear_table(table, ARRAY_SIZE(table));
+	free(active);
 }
 
 static void pp_header(enum cmit_fmt fmt,
-- 
1.5.3.5.1549.g91a3

^ permalink raw reply related

* Re: [PATCH] Make git-clean a builtin
From: Pierre Habouzit @ 2007-11-04 19:41 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: git, gitster
In-Reply-To: <11942029474058-git-send-email-shawn.bohrer@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]

On Sun, Nov 04, 2007 at 07:02:21PM +0000, Shawn Bohrer wrote:

> +	for (i = 1; i < argc; i++) {
> +		const char *arg = argv[i];
> +
> +		if (arg[0] != '-')
> +			break;
> +		if (!strcmp(arg, "--")) {
> +			i++;
> +			break;
> +		}
> +		if (!strcmp(arg, "-n")) {
> +			show_only = 1;
> +			disabled = 0;
> +			continue;
> +		}
> +		if (!strcmp(arg, "-f")) {
> +			disabled = 0;
> +			continue;
> +		}
> +		if (!strcmp(arg, "-d")) {
> +			remove_directories = 1;
> +			continue;
> +		}
> +		if (!strcmp(arg, "-q")) {
> +			quiet = 1;
> +			continue;
> +		}
> +		if (!strcmp(arg, "-x")) {
> +			ignored = 1;
> +			continue;
> +		}
> +		if (!strcmp(arg, "-X")) {
> +			ignored_only = 1;
> +			dir.show_ignored =1;
> +			dir.exclude_per_dir = ".gitignore";
> +			continue;
> +		}
> +		usage(builtin_clean_usage);

  Please, parse-options.c is now in next, please use it.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH] upload-pack: Use finish_{command,async}() instead of waitpid().
From: Johannes Sixt @ 2007-11-04 19:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:

- If everything is OK, rev-list closes its output pipe end, upon which
  pack-objects (which reads from the pipe) sees EOF and terminates itself,
  closing its output (and error) pipes. upload-pack reads from both until
  it sees EOF in both. It collects the exit codes of the child processes
  (which indicate success) and terminates successfully.

- If rev-list sees an error, it closes its output and terminates with
  failure. pack-objects sees EOF in its input and terminates successfully.
  Again upload-pack reads its inputs until EOF. When it now collects
  the exit codes of its child processes, it notices the failure of rev-list
  and signals failure to the remote end.

- If pack-objects sees an error, it terminates with failure. Since this
  breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
  upload-pack reads its input until EOF, then collects the exit codes of
  the child processes, notices their failures, and signals failure to the
  remote end.

- If upload-pack itself dies unexpectedly, pack-objects is killed with
  SIGPIPE, and subsequently also rev-list.

The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.

The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
---
	This patch allows us to reduce the differences to the
	MinGW port even further. It goes on top of js/forkexec
	(which meanwhile is in master).

	The test case checks for failures in rev-list (a missing
	object). Any hints how to trigger a failure in pack-objects
	that does not also trigger in rev-list would be welcome.

	BTW, I don't know what it means to process zombies if the
	parent does not waitpid(), but just terminates. Does this
	work as expected, ie. no zombies are left behind?

	-- Hannes

 t/t5530-upload-pack-error.sh |   49 +++++++++++
 upload-pack.c                |  192 +++++++++++++++++-------------------------
 2 files changed, 126 insertions(+), 115 deletions(-)
 create mode 100755 t/t5530-upload-pack-error.sh

diff --git a/t/t5530-upload-pack-error.sh b/t/t5530-upload-pack-error.sh
new file mode 100755
index 0000000..9923ba0
--- /dev/null
+++ b/t/t5530-upload-pack-error.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+
+test_description='errors in upload-pack'
+
+. ./test-lib.sh
+
+D=`pwd`
+
+test_expect_success 'setup and corrupt repository' '
+
+	echo file >file &&
+	git add file &&
+	git rev-parse :file &&
+	git commit -a -m original &&
+	test_tick &&
+	echo changed >file &&
+	git commit -a -m changed &&
+	rm -f .git/objects/f7/3f3093ff865c514c6c51f867e35f693487d0d3
+
+'
+
+test_expect_failure 'fsck fails' '
+
+	git fsck
+'
+
+test_expect_failure 'upload pack fails due to error in rev-list' '
+
+	echo "0032want $(git rev-parse HEAD)
+00000009done
+0000" | git-upload-pack . > /dev/null
+
+'
+
+test_expect_success 'create empty repository' '
+
+	mkdir foo &&
+	cd foo &&
+	git init
+
+'
+
+test_expect_failure 'fetch fails' '
+
+	git fetch .. master
+
+'
+
+test_done
diff --git a/upload-pack.c b/upload-pack.c
index 6799468..7e04311 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -144,6 +144,7 @@ static void create_pack_file(void)
 	char abort_msg[] = "aborting due to possible repository "
 		"corruption on the remote side.";
 	int buffered = -1;
+	ssize_t sz;
 	const char *argv[10];
 	int arg = 0;
 
@@ -168,22 +169,15 @@ static void create_pack_file(void)
 	pack_objects.git_cmd = 1;
 	pack_objects.argv = argv;
 
-	if (start_command(&pack_objects)) {
-		/* daemon sets things up to ignore TERM */
-		kill(rev_list.pid, SIGKILL);
+	if (start_command(&pack_objects))
 		die("git-upload-pack: unable to fork git-pack-objects");
-	}
 
 	/* We read from pack_objects.err to capture stderr output for
 	 * progress bar, and pack_objects.out to capture the pack data.
 	 */
 
 	while (1) {
-		const char *who;
 		struct pollfd pfd[2];
-		pid_t pid;
-		int status;
-		ssize_t sz;
 		int pe, pu, pollsize;
 
 		reset_timeout();
@@ -204,123 +198,91 @@ static void create_pack_file(void)
 			pollsize++;
 		}
 
-		if (pollsize) {
-			if (poll(pfd, pollsize, -1) < 0) {
-				if (errno != EINTR) {
-					error("poll failed, resuming: %s",
-					      strerror(errno));
-					sleep(1);
-				}
-				continue;
-			}
-			if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
-				/* Data ready; we keep the last byte
-				 * to ourselves in case we detect
-				 * broken rev-list, so that we can
-				 * leave the stream corrupted.  This
-				 * is unfortunate -- unpack-objects
-				 * would happily accept a valid pack
-				 * data with trailing garbage, so
-				 * appending garbage after we pass all
-				 * the pack data is not good enough to
-				 * signal breakage to downstream.
-				 */
-				char *cp = data;
-				ssize_t outsz = 0;
-				if (0 <= buffered) {
-					*cp++ = buffered;
-					outsz++;
-				}
-				sz = xread(pack_objects.out, cp,
-					  sizeof(data) - outsz);
-				if (0 < sz)
-						;
-				else if (sz == 0) {
-					close(pack_objects.out);
-					pack_objects.out = -1;
-				}
-				else
-					goto fail;
-				sz += outsz;
-				if (1 < sz) {
-					buffered = data[sz-1] & 0xFF;
-					sz--;
-				}
-				else
-					buffered = -1;
-				sz = send_client_data(1, data, sz);
-				if (sz < 0)
-					goto fail;
-			}
-			if (0 <= pe && (pfd[pe].revents & (POLLIN|POLLHUP))) {
-				/* Status ready; we ship that in the side-band
-				 * or dump to the standard error.
-				 */
-				sz = xread(pack_objects.err, progress,
-					  sizeof(progress));
-				if (0 < sz)
-					send_client_data(2, progress, sz);
-				else if (sz == 0) {
-					close(pack_objects.err);
-					pack_objects.err = -1;
-				}
-				else
-					goto fail;
+		if (!pollsize)
+			break;
+
+		if (poll(pfd, pollsize, -1) < 0) {
+			if (errno != EINTR) {
+				error("poll failed, resuming: %s",
+				      strerror(errno));
+				sleep(1);
 			}
+			continue;
 		}
-
-		/* See if the children are still there */
-		if (rev_list.pid || pack_objects.pid) {
-			pid = waitpid(-1, &status, WNOHANG);
-			if (!pid)
-				continue;
-			who = ((pid == rev_list.pid) ? "git-rev-list" :
-			       (pid == pack_objects.pid) ? "git-pack-objects" :
-			       NULL);
-			if (!who) {
-				if (pid < 0) {
-					error("git-upload-pack: %s",
-					      strerror(errno));
-					goto fail;
-				}
-				error("git-upload-pack: we weren't "
-				      "waiting for %d", pid);
-				continue;
+		if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
+			/* Data ready; we keep the last byte to ourselves
+			 * in case we detect broken rev-list, so that we
+			 * can leave the stream corrupted.  This is
+			 * unfortunate -- unpack-objects would happily
+			 * accept a valid packdata with trailing garbage,
+			 * so appending garbage after we pass all the
+			 * pack data is not good enough to signal
+			 * breakage to downstream.
+			 */
+			char *cp = data;
+			ssize_t outsz = 0;
+			if (0 <= buffered) {
+				*cp++ = buffered;
+				outsz++;
+			}
+			sz = xread(pack_objects.out, cp,
+				  sizeof(data) - outsz);
+			if (0 < sz)
+					;
+			else if (sz == 0) {
+				close(pack_objects.out);
+				pack_objects.out = -1;
 			}
-			if (!WIFEXITED(status) || WEXITSTATUS(status) > 0) {
-				error("git-upload-pack: %s died with error.",
-				      who);
+			else
 				goto fail;
+			sz += outsz;
+			if (1 < sz) {
+				buffered = data[sz-1] & 0xFF;
+				sz--;
 			}
-			if (pid == rev_list.pid)
-				rev_list.pid = 0;
-			if (pid == pack_objects.pid)
-				pack_objects.pid = 0;
-			if (rev_list.pid || pack_objects.pid)
-				continue;
-		}
-
-		/* both died happily */
-		if (pollsize)
-			continue;
-
-		/* flush the data */
-		if (0 <= buffered) {
-			data[0] = buffered;
-			sz = send_client_data(1, data, 1);
+			else
+				buffered = -1;
+			sz = send_client_data(1, data, sz);
 			if (sz < 0)
 				goto fail;
-			fprintf(stderr, "flushed.\n");
 		}
-		if (use_sideband)
-			packet_flush(1);
-		return;
+		if (0 <= pe && (pfd[pe].revents & (POLLIN|POLLHUP))) {
+			/* Status ready; we ship that in the side-band
+			 * or dump to the standard error.
+			 */
+			sz = xread(pack_objects.err, progress,
+				  sizeof(progress));
+			if (0 < sz)
+				send_client_data(2, progress, sz);
+			else if (sz == 0) {
+				close(pack_objects.err);
+				pack_objects.err = -1;
+			}
+			else
+				goto fail;
+		}
+	}
+
+	if (finish_command(&pack_objects)) {
+		error("git-upload-pack: git-pack-objects died with error.");
+		goto fail;
+	}
+	if (finish_async(&rev_list))
+		goto fail;	/* error was already reported */
+
+	/* flush the data */
+	if (0 <= buffered) {
+		data[0] = buffered;
+		sz = send_client_data(1, data, 1);
+		if (sz < 0)
+			goto fail;
+		fprintf(stderr, "flushed.\n");
 	}
+	if (use_sideband)
+		packet_flush(1);
+	return;
+
  fail:
-	if (pack_objects.pid)
-		kill(pack_objects.pid, SIGKILL);
-	if (rev_list.pid)
-		kill(rev_list.pid, SIGKILL);
 	send_client_data(3, abort_msg, sizeof(abort_msg));
 	die("git-upload-pack: %s", abort_msg);
 }
-- 
1.5.3.4.315.g2ce38

^ permalink raw reply related

* [PATCH 3/2] Enhance --early-output format
From: Linus Torvalds @ 2007-11-04 20:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Marco Costalba, Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0711041004220.15101@woody.linux-foundation.org>


This makes --early-output a bit more advanced, and actually makes it 
generate multiple "Final output:" headers as it updates things 
asynchronously. I realize that the "Final output:" line is now illogical, 
since it's not really final until it also says "done", but 

It now _always_ generates a "Final output:" header in front of any commit 
list, and that output header gives you a *guess* at the maximum number of 
commits available. However, it should be noted that the guess can be 
completely off: I do a reasonable job estimating it, but it is not meant 
to be exact. 

So what happens is that you may get output like this:

 - at 0.1 seconds:

	Final output: 2 incomplete
	.. 2 commits listed ..

 - half a second later:

	Final output: 33 incomplete
	.. 33 commits listed ..

 - another half a second after that:	

	Final output: 71 incomplete
	.. 71 commits listed ..

 - another half second later:

	Final output: 136 incomplete
	.. 100 commits listed: we hit the --early-output limit, and
	.. will only output 100 commits, and after this you'll not
	.. see an "incomplete" report any more since you got as much
	.. early output as you asked for!

 - .. and then finally:

	Final output: 73106 done
	.. all the commits ..

The above is a real-life scenario on my current kernel tree after having 
flushed all the caches.

Tested with the experimental gitk patch that Paul sent out, and by looking 
at the actual log output (and verifying that my commit count guesses 
actually match real life fairly well).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

On Sun, 4 Nov 2007, Linus Torvalds wrote:
> 
> I'm looking at it now, I'll have to think about this a bit more. It might 
> be trivial to fix, but this thing has real potential for being subtle.

It wasn't totally trivial, but it doesn't seem to be excessively subtle 
either. About half the patch is moving around some code to look at whether 
the commit is interesting or not and rewriting the parents, so that it can 
be shared with the revision walker.

 builtin-log.c |   88 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 revision.c    |   63 +++++++++++++++++++++++-----------------
 revision.h    |    8 +++++
 3 files changed, 119 insertions(+), 40 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 707add2..268a7af 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -77,17 +77,85 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
 	}
 }
 
+/*
+ * This gives a rough estimate for how many commits we
+ * will print out in the list.
+ */
+static int estimate_commit_count(struct rev_info *rev, struct commit_list *list)
+{
+	int n = 0;
+
+	while (list) {
+		struct commit *commit = list->item;
+		unsigned int flags = commit->object.flags;
+
+		list = list->next;
+		if (flags & UNINTERESTING)
+			continue;
+		if (rev->prune_fn && rev->dense && !(flags & TREECHANGE)) {
+			if (commit->parents && !commit->parents->next)
+				continue;
+		}
+		n++;
+	}
+	return n;
+}
+
+static void show_early_header(struct rev_info *rev, const char *stage, int nr)
+{
+	if (rev->shown_one) {
+		rev->shown_one = 0;
+		if (rev->commit_format != CMIT_FMT_ONELINE)
+			putchar(rev->diffopt.line_termination);
+	}
+	printf("Final output: %d %s\n", nr, stage);
+}
+
+struct itimerval early_output_timer;
+
 static void log_show_early(struct rev_info *revs, struct commit_list *list)
 {
 	int i = revs->early_output;
+	int show_header = 1;
 
 	sort_in_topological_order(&list, revs->lifo);
 	while (list && i) {
 		struct commit *commit = list->item;
-		log_tree_commit(revs, commit);
+		switch (simplify_commit(revs, commit)) {
+		case commit_show:
+			if (show_header) {
+				int n = estimate_commit_count(revs, list);
+				show_early_header(revs, "incomplete", n);
+				show_header = 0;
+			}
+			log_tree_commit(revs, commit);
+			i--;
+			break;
+		case commit_ignore:
+			break;
+		case commit_error:
+			return;
+		}
 		list = list->next;
-		i--;
 	}
+
+	/* Did we already get enough commits for the early output? */
+	if (!i)
+		return;
+
+	/*
+	 * ..if no, then repeat it twice a second until we
+	 * do.
+	 *
+	 * NOTE! We don't use "it_interval", because if the
+	 * reader isn't listening, we want our output to be
+	 * throttled by the writing, and not have the timer
+	 * trigger every second even if we're blocked on a
+	 * reader!
+	 */
+	early_output_timer.it_value.tv_sec = 0;
+	early_output_timer.it_value.tv_usec = 500000;
+	setitimer(ITIMER_REAL, &early_output_timer, NULL);
 }
 
 static void early_output(int signal)
@@ -98,7 +166,6 @@ static void early_output(int signal)
 static void setup_early_output(struct rev_info *rev)
 {
 	struct sigaction sa;
-	struct itimerval v;
 
 	/*
 	 * Set up the signal handler, minimally intrusively:
@@ -120,21 +187,16 @@ static void setup_early_output(struct rev_info *rev)
 	 *
 	 * This is a one-time-only trigger.
 	 */
-	memset(&v, 0, sizeof(v));
-	v.it_value.tv_sec = 0;
-	v.it_value.tv_usec = 100000;
-	setitimer(ITIMER_REAL, &v, NULL);
+	early_output_timer.it_value.tv_sec = 0;
+	early_output_timer.it_value.tv_usec = 100000;
+	setitimer(ITIMER_REAL, &early_output_timer, NULL);
 }
 
 static void finish_early_output(struct rev_info *rev)
 {
+	int n = estimate_commit_count(rev, rev->commits);
 	signal(SIGALRM, SIG_IGN);
-	if (rev->shown_one) {
-		rev->shown_one = 0;
-		if (rev->commit_format != CMIT_FMT_ONELINE)
-			putchar(rev->diffopt.line_termination);
-	}
-	printf("Final output:\n");
+	show_early_header(rev, "done", n);
 }
 
 static int cmd_log_walk(struct rev_info *rev)
diff --git a/revision.c b/revision.c
index 26610bb..5d6f208 100644
--- a/revision.c
+++ b/revision.c
@@ -1398,6 +1398,36 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 			   commit->buffer, strlen(commit->buffer));
 }
 
+enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
+{
+	if (commit->object.flags & SHOWN)
+		return commit_ignore;
+	if (revs->unpacked && has_sha1_pack(commit->object.sha1, revs->ignore_packed))
+		return commit_ignore;
+	if (commit->object.flags & UNINTERESTING)
+		return commit_ignore;
+	if (revs->min_age != -1 && (commit->date > revs->min_age))
+		return commit_ignore;
+	if (revs->no_merges && commit->parents && commit->parents->next)
+		return commit_ignore;
+	if (!commit_match(commit, revs))
+		return commit_ignore;
+	if (revs->prune_fn && revs->dense) {
+		/* Commit without changes? */
+		if (!(commit->object.flags & TREECHANGE)) {
+			/* drop merges unless we want parenthood */
+			if (!revs->parents)
+				return commit_ignore;
+			/* non-merge - always ignore it */
+			if (!commit->parents || !commit->parents->next)
+				return commit_ignore;
+		}
+		if (revs->parents && rewrite_parents(revs, commit) < 0)
+			return commit_error;
+	}
+	return commit_show;
+}
+
 static struct commit *get_revision_1(struct rev_info *revs)
 {
 	if (!revs->commits)
@@ -1425,36 +1455,15 @@ static struct commit *get_revision_1(struct rev_info *revs)
 			if (add_parents_to_list(revs, commit, &revs->commits) < 0)
 				return NULL;
 		}
-		if (commit->object.flags & SHOWN)
-			continue;
-
-		if (revs->unpacked && has_sha1_pack(commit->object.sha1,
-						    revs->ignore_packed))
-		    continue;
 
-		if (commit->object.flags & UNINTERESTING)
-			continue;
-		if (revs->min_age != -1 && (commit->date > revs->min_age))
-			continue;
-		if (revs->no_merges &&
-		    commit->parents && commit->parents->next)
-			continue;
-		if (!commit_match(commit, revs))
+		switch (simplify_commit(revs, commit)) {
+		case commit_ignore:
 			continue;
-		if (revs->prune_fn && revs->dense) {
-			/* Commit without changes? */
-			if (!(commit->object.flags & TREECHANGE)) {
-				/* drop merges unless we want parenthood */
-				if (!revs->parents)
-					continue;
-				/* non-merge - always ignore it */
-				if (!commit->parents || !commit->parents->next)
-					continue;
-			}
-			if (revs->parents && rewrite_parents(revs, commit) < 0)
-				return NULL;
+		case commit_error:
+			return -1;
+		default:
+			return commit;
 		}
-		return commit;
 	} while (revs->commits);
 	return NULL;
 }
diff --git a/revision.h b/revision.h
index d8a5a50..2232247 100644
--- a/revision.h
+++ b/revision.h
@@ -133,4 +133,12 @@ extern void add_object(struct object *obj,
 
 extern void add_pending_object(struct rev_info *revs, struct object *obj, const char *name);
 
+enum commit_action {
+	commit_ignore,
+	commit_show,
+	commit_error
+};
+
+extern enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit);
+
 #endif

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox