Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 6/9] git-hash-object: Add --stdin-paths option
From: Junio C Hamano @ 2007-10-27  1:02 UTC (permalink / raw)
  To: Brian Downing; +Cc: Adam Roben, git
In-Reply-To: <20071026231902.GC2519@lavos.net>

bdowning@lavos.net (Brian Downing) writes:

> On Fri, Oct 26, 2007 at 02:00:47PM -0700, Junio C Hamano wrote:
>> In addition, if you are enhancing cat-file to spew chunked
>> output out, I suspect that there should be a mode of operation
>> for hash-object that eats that data format.  IOW, this pipe
>> 
>> 	git-cat-file --batch <list-of-sha1 |
>>         git-hash-object --batch
>> 
>> should be an intuitive no-op, shouldn't it?
>
> I think that's an obviously good thing to do.  However, given your
> suggested output format (which I also like):
>
>>    * git-cat-file --batch <list-of-sha1
>> 
>>      outputs a record of this form
>> 
>>           <sha1> SP <type> SP <size> LF <contents> LF
>> 
>>      for each of the input lines.
>
> What should the input behavior be?  Obviously the sha1 will probably
> not be known on the input side.  Should that simply be optional (i.e.
> it will accept either "<sha1> SP <type> SP <size>" or "<type> SP <size>"
> or should it only accept the latter, and a dummy sha1 will need to be
> filled in if the sha1 is not known (presumably "000...000")?

Yeah, you caught me ;-)

Either making it optional or requiring a dummy value would work.
If a non-dummy value is given, we could use it to validate it.

But that would not be a useful application anyway.  So perhaps
just the sequence of "<type> SP <size> LF <contents> LF" would
be the most sensible thing to do.

^ permalink raw reply

* Re: [PATCH 2/2] Do the fuzzy rename detection limits with the exact renames removed
From: Linus Torvalds @ 2007-10-27  0:12 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0710261654300.30120@woody.linux-foundation.org>

On Fri, 26 Oct 2007, Linus Torvalds wrote:
> 
> Works For Me(tm), although this isn't all that obviously testable (ie I 
> should find a test that is border-line, and triggers the rename detection 
> limits, but then has enough exact renames that the rename detection array 
> goes away).

Actually, I just tested it. I used that same 100,000 file thing, but added 
one more (larger) file, and did another commit that moved the 100,000 
files exactly, and the one larger file with a small change.

The code before this patch (but with my linear-time rename changes) would 
find the 100,000 exact renames, but would *not* find the one that had a 
small change, because it would hit the rename limit, and wouldn't even 
try.

With these two patches in place, it finds all the exact renames, and once 
it has done that, the resulting rename array is small enough (one single 
unknown target remaining, even if there are 100,001 possible source files) 
that it doesn't trigger the rename limit, and it now finds the remaining 
non-exact rename too.

So it not only looked obvious, it seems to work too.

		Linus

^ permalink raw reply

* [PATCH 2/2] Do the fuzzy rename detection limits with the exact renames removed
From: Linus Torvalds @ 2007-10-26 23:56 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0710261646230.30120@woody.linux-foundation.org>


When we do the fuzzy rename detection, we don't care about the
destinations that we already handled with the exact rename detector.
And, in fact, the code already knew that - but the rename limiter, which
used to run *before* exact renames were detected, did not.

This fixes it so that the rename detection limiter now bases its
decisions on the *remaining* rename counts, rather than the original
ones.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Pretty obvious. It just moves the tests around a bit, and uses the updated 
counters.

Works For Me(tm), although this isn't all that obviously testable (ie I 
should find a test that is border-line, and triggers the rename detection 
limits, but then has enough exact renames that the rename detection array 
goes away).

 diffcore-rename.c |   32 ++++++++++++++++++--------------
 1 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 7ed5ef8..f9ebea5 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -435,33 +435,37 @@ void diffcore_rename(struct diff_options *options)
 	 */
 	rename_count = find_exact_renames();
 
+	/* Did we only want exact renames? */
+	if (minimum_score == MAX_SCORE)
+		goto cleanup;
+
+	/*
+	 * Calculate how many renames are left (but all the source
+	 * files still remain as options for rename/copies!)
+	 */
+	num_create = (rename_dst_nr - rename_count);
+	num_src = rename_src_nr;
+
+	/* All done? */
+	if (!num_create)
+		goto cleanup;
+
 	/*
 	 * This basically does a test for the rename matrix not
 	 * growing larger than a "rename_limit" square matrix, ie:
 	 *
-	 *    rename_dst_nr * rename_src_nr > rename_limit * rename_limit
+	 *    num_create * num_src > rename_limit * rename_limit
 	 *
 	 * but handles the potential overflow case specially (and we
 	 * assume at least 32-bit integers)
 	 */
 	if (rename_limit <= 0 || rename_limit > 32767)
 		rename_limit = 32767;
-	if (rename_dst_nr > rename_limit && rename_src_nr > rename_limit)
+	if (num_create > rename_limit && num_src > rename_limit)
 		goto cleanup;
-	if (rename_dst_nr * rename_src_nr > rename_limit * rename_limit)
+	if (num_create * num_src > rename_limit * rename_limit)
 		goto cleanup;
 
-	/* Have we run out the created file pool?  If so we can avoid
-	 * doing the delta matrix altogether.
-	 */
-	if (rename_count == rename_dst_nr)
-		goto cleanup;
-
-	if (minimum_score == MAX_SCORE)
-		goto cleanup;
-
-	num_create = (rename_dst_nr - rename_count);
-	num_src = rename_src_nr;
 	mx = xmalloc(sizeof(*mx) * num_create * num_src);
 	for (dst_cnt = i = 0; i < rename_dst_nr; i++) {
 		int base = dst_cnt * num_src;

^ permalink raw reply related

* [PATCH 1/2] Fix ugly magic special case in exact rename detection
From: Linus Torvalds @ 2007-10-26 23:51 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List

For historical reasons, the exact rename detection had populated the
filespecs for the entries it compared, and the rest of the similarity
analysis depended on that.  I hadn't even bothered to debug why that was
the case when I re-did the rename detection, I just made the new one
have the same broken behaviour, with a note about this special case.

This fixes that fixme.  The reason the exact rename detector needed to
fill in the file sizes of the files it checked was that the _inexact_
rename detector was broken, and started comparing file sizes before it
filled them in.

Fixing that allows the exact phase to do the sane thing of never even
caring (since all *it* cares about is really just the SHA1 itself, not
the size nor the contents).

It turns out that this also indirectly fixes a bug: trying to populate
all the filespecs will run out of virtual memory if there is tons and
tons of possible rename options.  The fuzzy similarity analysis does the
right thing in this regard, and free's the blob info after it has
generated the hash tables, so the special case code caused more trouble
than just some extra illogical code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Without this, when I try to do a "git commit --amend" on my 
horror-repository with the commit that moved around a hundred thousand 
files, I actually have "git runstatus" die() on me, because xmmap() fails 
(mmap() returns NULL due to running out of vm mappings).

Not to mention that this is "obviously correct" (tm) and the "right thing" 
(tm)" to do (famous last words). It's wrong of estimate_similarity() to 
try to compare the sizes of the filespec entries before they have been 
filled in, and it really shouldn't expect the exact rename detection to 
fill them in, since the exact rename detection doesn't even care!

 diffcore-rename.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 3946932..7ed5ef8 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -144,6 +144,20 @@ static int estimate_similarity(struct diff_filespec *src,
 	if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
 		return 0;

+	/*
+	 * Need to check that source and destination sizes are
+	 * filled in before comparing them.
+	 *
+	 * If we already have "cnt_data" filled in, we know it's
+	 * all good (avoid checking the size for zero, as that
+	 * is a possible size - we really should have a flag to
+	 * say whether the size is valid or not!)
+	 */
+	if (!src->cnt_data && diff_populate_filespec(src, 0))
+		return 0;
+	if (!dst->cnt_data && diff_populate_filespec(dst, 0))
+		return 0;
+
 	max_size = ((src->size > dst->size) ? src->size : dst->size);
 	base_size = ((src->size < dst->size) ? src->size : dst->size);
 	delta_size = max_size - base_size;
@@ -159,11 +173,6 @@ static int estimate_similarity(struct diff_filespec *src,
 	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;

-	if ((!src->cnt_data && diff_populate_filespec(src, 0))
-		|| (!dst->cnt_data && diff_populate_filespec(dst, 0)))
-		return 0; /* error but caught downstream */
-
-
 	delta_limit = (unsigned long)
 		(base_size * (MAX_SCORE-minimum_score) / MAX_SCORE);
 	if (diffcore_count_changes(src, dst,
@@ -270,19 +279,11 @@ static int find_identical_files(struct file_similarity *src,
 	return renames;
 }

-/*
- * Note: the rest of the rename logic depends on this
- * phase also populating all the filespecs for any
- * entry that isn't matched up with an exact rename.
- */
 static void free_similarity_list(struct file_similarity *p)
 {
 	while (p) {
 		struct file_similarity *entry = p;
 		p = p->next;
-
-		/* Stupid special case, see note above! */
-		diff_populate_filespec(entry->filespec, 0);
 		free(entry);
 	}
 }

^ permalink raw reply related

* Re: [PATCH 4/6] copy vs rename detection: avoid unnecessary O(n*m) loops
From: Linus Torvalds @ 2007-10-26 23:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7vy7dpv4ls.fsf@gitster.siamese.dyndns.org>

On Fri, 26 Oct 2007, Junio C Hamano wrote:
> 
> So for a broken pair, the actual value of rename_used does not really 
> matter.  We only care about it not going down to zero.

Correct. The rename_used count really is immaterial *except* for the magic 
distinction between zero ("it's a rename, no original source file left") 
and non-zero ("it's a copy, original source file remains").

Which is why the new counter is so fundamentally simple: by decrementing 
it for each rename we encounter, we automatically get that behaviour of 
"only the last user turns into a 'rename' if the source file really went 
away" that we want. 

The old code did it all with some really expensive loops over the 
remaining renames.

		Linus

^ permalink raw reply

* Re: [PATCH] Make rebase smarter
From: Junio C Hamano @ 2007-10-26 23:29 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Steven Walter, git, federico
In-Reply-To: <Pine.LNX.4.64.0710270013030.4362@racer.site>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Fri, 26 Oct 2007, Junio C Hamano wrote:
>
>> Steven Walter <stevenrwalter@gmail.com> writes:
>> 
>> > It is a common workflow to run "git fetch; git rebase origin/<foo>" 
>> > Where foo is the remote tracking branch.  git-rebase should default to 
>> > using the remote tracking branch if no other ref is given.
>> 
>> This would be a reasonable choice between refusing outright and
>> picking one possible action.
>
> Another sensible choice would be "git rebase FETCH_HEAD", at least just 
> after a "git fetch <nick> <branch>"...

We can get the best of both worlds by noticing a line in
FETCH_HEAD without not-for-merge marker and use that as the
'onto' commit for the rebase.

^ permalink raw reply

* Re: [PATCH 4/6] copy vs rename detection: avoid unnecessary O(n*m) loops
From: Junio C Hamano @ 2007-10-26 23:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0710261600510.30120@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> The nice thing about the whole counting thing is that we really don't even 
> care. What happens is:
>
>  - *if* any rename at all happens, the rename detection will increment the 
>    "rename_used" thing even more for the source (once for each rename)
>
>  - so if the rename_used started out non-zero, it will never become zero 
>    in diff_resolve_rename_copy(), and every single detected "rename" will 
>    be considered a copy - exactly like we want.
>
>  - in other words, a "rename_used++" before rename-detection guarantees 
>    that you never see a rename, only a copy of the source.

Ok, and that is because we _know_ a broken pair means that the
path itself remains in the postimage and any other postimage
that takes from its preimage can never "rename the path away".
They can only be copies.  Which makes sense.

> The above is actually true *even*if* the 
>
> 	if (!strcmp(p->one->path, p->two->path))
>
> code in diff_resolve_rename_copy() actually triggers,...
> ..., so not doing 
> the decrement there is equivalent to doing another pre-increment.

Yes, this is what confused me.  So for a broken pair, the actual
value of rename_used does not really matter.  We only care about
it not going down to zero.

^ permalink raw reply

* Re: [PATCH 6/9] git-hash-object: Add --stdin-paths option
From: Brian Downing @ 2007-10-26 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Adam Roben, git
In-Reply-To: <7vy7dpwpz4.fsf@gitster.siamese.dyndns.org>

On Fri, Oct 26, 2007 at 02:00:47PM -0700, Junio C Hamano wrote:
> In addition, if you are enhancing cat-file to spew chunked
> output out, I suspect that there should be a mode of operation
> for hash-object that eats that data format.  IOW, this pipe
> 
> 	git-cat-file --batch <list-of-sha1 |
>         git-hash-object --batch
> 
> should be an intuitive no-op, shouldn't it?

I think that's an obviously good thing to do.  However, given your
suggested output format (which I also like):

>    * git-cat-file --batch <list-of-sha1
> 
>      outputs a record of this form
> 
>           <sha1> SP <type> SP <size> LF <contents> LF
> 
>      for each of the input lines.

What should the input behavior be?  Obviously the sha1 will probably
not be known on the input side.  Should that simply be optional (i.e.
it will accept either "<sha1> SP <type> SP <size>" or "<type> SP <size>"
or should it only accept the latter, and a dummy sha1 will need to be
filled in if the sha1 is not known (presumably "000...000")?

-bcd

^ permalink raw reply

* Re: [PATCH] Make rebase smarter
From: Johannes Schindelin @ 2007-10-26 23:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Steven Walter, git, federico
In-Reply-To: <7vk5p9wpwd.fsf@gitster.siamese.dyndns.org>

Hi,

On Fri, 26 Oct 2007, Junio C Hamano wrote:

> Steven Walter <stevenrwalter@gmail.com> writes:
> 
> > It is a common workflow to run "git fetch; git rebase origin/<foo>" 
> > Where foo is the remote tracking branch.  git-rebase should default to 
> > using the remote tracking branch if no other ref is given.
> 
> This would be a reasonable choice between refusing outright and
> picking one possible action.

Another sensible choice would be "git rebase FETCH_HEAD", at least just 
after a "git fetch <nick> <branch>"...

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 4/6] copy vs rename detection: avoid unnecessary O(n*m) loops
From: Linus Torvalds @ 2007-10-26 23:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7vabq5wkri.fsf@gitster.siamese.dyndns.org>



On Fri, 26 Oct 2007, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > @@ -2640,27 +2640,21 @@ static void diff_resolve_rename_copy(void)
> >  		 * either in-place edit or rename/copy edit.
> >  		 */
> >  		else if (DIFF_PAIR_RENAME(p)) {
> > +			/*
> > +			 * A rename might have re-connected a broken
> > +			 * pair up, causing the pathnames to be the
> > +			 * same again. If so, that's not a rename at
> > +			 * all, just a modification..
> > +			 *
> > +			 * Otherwise, see if this source was used for
> > +			 * multiple renames, in which case we decrement
> > +			 * the count, and call it a copy.
> >  			 */
> > +			if (!strcmp(p->one->path, p->two->path))
> > +				p->status = DIFF_STATUS_MODIFIED;
> > +			else if (--p->one->rename_used > 0)
> >  				p->status = DIFF_STATUS_COPIED;
> > +			else
> >  				p->status = DIFF_STATUS_RENAMED;
> >  		}
> >  		else if (hashcmp(p->one->sha1, p->two->sha1) ||
> 
> The interaction between the above and ...

Heh.

I'm pretty sure it's right, because:

> > @@ -338,15 +320,25 @@ void diffcore_rename(struct diff_options *options)
> >  				locate_rename_dst(p->two, 1);
> >  		}
> >  		else if (!DIFF_FILE_VALID(p->two)) {
> > +			/*
> > +			 * If the source is a broken "delete", and
> >  			 * they did not really want to get broken,
> >  			 * that means the source actually stays.
> > +			 * So we increment the "rename_used" score
> > +			 * by one, to indicate ourselves as a user
> > +			 */
> > +			if (p->broken_pair && !p->score)
> > +				p->one->rename_used++;
> > +			register_rename_src(p->one, p->score);
> > +		}
> > +		else if (detect_rename == DIFF_DETECT_COPY) {
> > +			/*
> > +			 * Increment the "rename_used" score by
> > +			 * one, to indicate ourselves as a user.
> >  			 */
> > +			p->one->rename_used++;
> > +			register_rename_src(p->one, p->score);
> >  		}
> >  	}
> >  	if (rename_dst_nr == 0 || rename_src_nr == 0)
> >  		goto cleanup; /* nothing to do */
> 
> ... this part feels a bit too subtle for a still-jet-lagged
> brain to grok.  I wonder what happens if the preimage of a
> broken pair is used as the rename source for more than two
> postimages.

The nice thing about the whole counting thing is that we really don't even 
care. What happens is:

 - *if* any rename at all happens, the rename detection will increment the 
   "rename_used" thing even more for the source (once for each rename)

 - so if the rename_used started out non-zero, it will never become zero 
   in diff_resolve_rename_copy(), and every single detected "rename" will 
   be considered a copy - exactly like we want.

 - in other words, a "rename_used++" before rename-detection guarantees 
   that you never see a rename, only a copy of the source.

The above is actually true *even*if* the 

	if (!strcmp(p->one->path, p->two->path))

code in diff_resolve_rename_copy() actually triggers, so we could have 
(and at one point I did) done the "--p->one->rename_used > 0" test before 
the strcmp() test, and it would all continue to work fine. However, the 
reason that I put the strcmp() first is that it needs to be done whether 
we decide to consider it a "copy" or a "rename" (so we cannot avoid it 
anyway), and *if* it triggers, we actually want to avoid the rename_used 
going down to zero anyway (not that it would, because I think it's bound 
to be one of the cases where we pre-incremented the count), so not doing 
the decrement there is equivalent to doing another pre-increment.

So I think it's all right, and more obviously right than it used to be. 
But hey, it's possible that I missed something.

		Linus

^ permalink raw reply

* Re: [PATCH 1/7] "git" calls help_unknown_cmd(""); "git help" and "git help -a" return 0
From: Junio C Hamano @ 2007-10-26 23:03 UTC (permalink / raw)
  To: Scott Parish; +Cc: git
In-Reply-To: <20071025045228.GE759@srparish.net>

Scott Parish <sRp@srparish.net> writes:

> That's what i was hoping this patch did. I'm not entirely sure how
> its wrong as it seems to work for me.

I misread the patch.  My mistake.

> Regarding "git: '' is not a git-command" the way i was seeing that
> is that git is usually only called with commands, and '' isn't a
> valid command, hence the reason to exit 1, the help is just a nice
> user experience.

But think who would type "git<Enter>".  They are either people
who (1) do not even know that "git" alone is not useful and that
it always wants a subcommand, or (2) know "git<Enter>" is the
same as "git help" and wants to get the "common command list"
quickly.  Technically, "'' is not a git-command" is correct, but
the message does not help either audience, does it?

^ permalink raw reply

* Re: [PATCH 4/6] copy vs rename detection: avoid unnecessary O(n*m) loops
From: Junio C Hamano @ 2007-10-26 22:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0710251119120.30120@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> @@ -2640,27 +2640,21 @@ static void diff_resolve_rename_copy(void)
>  		 * either in-place edit or rename/copy edit.
>  		 */
>  		else if (DIFF_PAIR_RENAME(p)) {
> +			/*
> +			 * A rename might have re-connected a broken
> +			 * pair up, causing the pathnames to be the
> +			 * same again. If so, that's not a rename at
> +			 * all, just a modification..
> +			 *
> +			 * Otherwise, see if this source was used for
> +			 * multiple renames, in which case we decrement
> +			 * the count, and call it a copy.
>  			 */
> +			if (!strcmp(p->one->path, p->two->path))
> +				p->status = DIFF_STATUS_MODIFIED;
> +			else if (--p->one->rename_used > 0)
>  				p->status = DIFF_STATUS_COPIED;
> +			else
>  				p->status = DIFF_STATUS_RENAMED;
>  		}
>  		else if (hashcmp(p->one->sha1, p->two->sha1) ||

The interaction between the above and ...

> @@ -338,15 +320,25 @@ void diffcore_rename(struct diff_options *options)
>  				locate_rename_dst(p->two, 1);
>  		}
>  		else if (!DIFF_FILE_VALID(p->two)) {
> +			/*
> +			 * If the source is a broken "delete", and
>  			 * they did not really want to get broken,
>  			 * that means the source actually stays.
> +			 * So we increment the "rename_used" score
> +			 * by one, to indicate ourselves as a user
> +			 */
> +			if (p->broken_pair && !p->score)
> +				p->one->rename_used++;
> +			register_rename_src(p->one, p->score);
> +		}
> +		else if (detect_rename == DIFF_DETECT_COPY) {
> +			/*
> +			 * Increment the "rename_used" score by
> +			 * one, to indicate ourselves as a user.
>  			 */
> +			p->one->rename_used++;
> +			register_rename_src(p->one, p->score);
>  		}
>  	}
>  	if (rename_dst_nr == 0 || rename_src_nr == 0)
>  		goto cleanup; /* nothing to do */

... this part feels a bit too subtle for a still-jet-lagged
brain to grok.  I wonder what happens if the preimage of a
broken pair is used as the rename source for more than two
postimages.

^ permalink raw reply

* Re: How to remove a specific hunk
From: Olivier Ramonat @ 2007-10-26 21:59 UTC (permalink / raw)
  To: git
In-Reply-To: <47220A05.4040705@wanadoo.fr>

Pascal,

Pascal Obry <pascal.obry@wanadoo.fr> writes:
> Andreas,
>
>> Once you've added the other two hunks, they'll no longer show up in
>> git-diff, so you can do something like this:
>> 
>> $ git-add -i; # add the other two hunks to commit
>> $ git-diff > middle-hunk.patch
>> $ git-apply -R middle-hunk.patch
>> test, test, test
>> $ git-apply middle-hunk.patch
>
> Thanks, this will clearly work. I was expecting something more
> integrated like a "git reset --interactive" or something like that :)

A solution could be :

git add -i 
--> Add the two "good" hunks

git checkout-index file_with_bad_hunk
--> Remove the "bad" hunk by getting the staged version

And then 
git reset HEAD file_with_bad_hunk 
if you want to unstage it.

Olivier

^ permalink raw reply

* Re: How to run git-gui always in English?
From: Alex Riesen @ 2007-10-26 21:41 UTC (permalink / raw)
  To: Peter Karlsson; +Cc: Steffen Prohaska, Shawn O. Pearce, Git Mailing List
In-Reply-To: <Pine.LNX.4.62.0710260857210.3542@perkele.intern.softwolves.pp.se>

Peter Karlsson, Fri, Oct 26, 2007 10:00:32 +0200:
> Steffen Prohaska:
>
>> There are a lot of efforts going on to localize git-gui, including 
>> technical terms like "push". Personally I don't understand what this 
>> should be useful for. The command is called "git push"s. So, why should it 
>> be named differently in the gui.
>
> Not that I agree that "push" is a technical word, but perhaps you have a 
> point. Why should there be such words in the GUI to start with? It's a GUI, 
> trying to abstract away the command line. Why not have a button "Send" or a 
> menu entry "Send changes to server", mimicing the "git push" command line 
> option? Using command line names or showing protocol data directly in a 
> user-oriented GUI is most often a bad idea.
>
> Or perhaps what we need is an actual translation from "gitish" to English, 
> which would have
>
> msgid "Push"
> msgstr "Send changes to server"
>

Because you do not send changes to a _server_. There is no server.
There is just another repo. Hence just "push"

^ permalink raw reply

* Re: [PATCH] Make rebase smarter
From: Junio C Hamano @ 2007-10-26 21:02 UTC (permalink / raw)
  To: Steven Walter; +Cc: git, federico
In-Reply-To: <1193373682-3608-1-git-send-email-stevenrwalter@gmail.com>

Steven Walter <stevenrwalter@gmail.com> writes:

> It is a common workflow to run "git fetch; git rebase origin/<foo>" Where
> foo is the remote tracking branch.  git-rebase should default to using
> the remote tracking branch if no other ref is given.

This would be a reasonable choice between refusing outright and
picking one possible action.  I do not have a strong preference
as to what that "one possible action" should be, but if people
like to base on the remote tracking branch set to merge by
default, I am fine with it.

> +	curr_branch=$(git symbolic-ref -q HEAD)
> +	curr_branch=${curr_branch//refs\/heads\//}
> +	merge=$(git config branch.$curr_branch.merge)
> +	remote=$(git config branch.$curr_branch.remote)
> +	fetch=$(git config remote.$remote.fetch)
> +
> +	expanded=$(git fetch--tool expand-refs-wildcard "0000000000000000000000000000000000000000 $merge" "$remote" "$fetch")
> +	upstream_name=${expanded/#*:/}
> +fi
>  upstream=`git rev-parse --verify "${upstream_name}^0"` ||
>      die "invalid upstream $upstream_name"

 * How does this work if there is no such tracking configuration?

   - branch.<curr>.merge may be missing;
   - branch.<curr>.remote may be missing;
   - remote.<remote>.fetch may be explicit, multiple and/or non-wildcard;

 * ${parameter/pattern/string} is a bashism we do not allow in
   git scripts.

Problems in the implementation aside, it probably makes sense to
first have a helper function that takes a local branch name and
computes the remote tracking branch that a given local branch is
set to merge from, if exists, and use it here.  I suspect there
are other places in the Porcelain that would benefit from such a
helper function.

^ permalink raw reply

* Re: [PATCH] git-fetch: print informative messages to stdout, not stderr
From: Junio C Hamano @ 2007-10-26 21:01 UTC (permalink / raw)
  To: Gerrit Pape; +Cc: git
In-Reply-To: <20071026085355.24930.qmail@2c06371a7c3ae6.315fe32.mid.smarden.org>

Gerrit Pape <pape@smarden.org> writes:

> git-fetch writes informations about storing tags and the like to stderr,
> which should only be used for errors.  This patch changes it to use
> stdout instead.

I do not have strong preference but the reason this goes to
stderr is because it is considered part of the progress
reporting.  It is not designed for consumption by scripted
callers, and its formatting is subject to change freely, which
is what allows the recent "terse fetch" discussion.

^ permalink raw reply

* Re: [PATCH 6/9] git-hash-object: Add --stdin-paths option
From: Junio C Hamano @ 2007-10-26 21:00 UTC (permalink / raw)
  To: Adam Roben; +Cc: git
In-Reply-To: <1193307927-3592-7-git-send-email-aroben@apple.com>

Adam Roben <aroben@apple.com> writes:

> This allows multiple paths to be specified on stdin.

Ok.  List of paths is certainly a good thing to have.

In addition, if you are enhancing cat-file to spew chunked
output out, I suspect that there should be a mode of operation
for hash-object that eats that data format.  IOW, this pipe

	git-cat-file --batch <list-of-sha1 |
        git-hash-object --batch

should be an intuitive no-op, shouldn't it?

^ permalink raw reply

* Re: [PATCH 5/9] Add tests for git hash-object
From: Junio C Hamano @ 2007-10-26 21:00 UTC (permalink / raw)
  To: Adam Roben; +Cc: git, Johannes Sixt
In-Reply-To: <1193307927-3592-6-git-send-email-aroben@apple.com>

Adam Roben <aroben@apple.com> writes:

> +test_expect_success \
> +    'hash a file' \
> +    "test $hello_sha1 = \$(git hash-object hello)"
> +
> +test_expect_success \
> +    'hash from stdin' \
> +    "test $hello_sha1 = \$(echo '$hello_content' | git hash-object --stdin)"

Needs to make sure no object has been written to the object
database at this point?

> +test_expect_success \
> +    'hash a file and write to database' \
> +    "test $hello_sha1 = \$(git hash-object -w hello)"

... and make sure the objectis written here?

> +test_expect_success \
> +    'hash from stdin and write to database' \
> +    "test $hello_sha1 = \$(echo '$hello_content' | git hash-object -w --stdin)"
> +
> +test_done

... and/or here?

^ permalink raw reply

* Re: [PATCH 4/9] git-cat-file: Add --stdin option
From: Junio C Hamano @ 2007-10-26 20:59 UTC (permalink / raw)
  To: Adam Roben; +Cc: git, Brian Downing
In-Reply-To: <1193307927-3592-5-git-send-email-aroben@apple.com>

Adam Roben <aroben@apple.com> writes:

> @@ -23,6 +23,10 @@ OPTIONS
>  	For a more complete list of ways to spell object names, see
>  	"SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1].
>  
> +--stdin::
> +	Read object names from stdin instead of specifying one on the
> +	command line.
> +

This does not talk about modified output format: what the format
is, nor when that modified format is used.

> @@ -139,16 +139,26 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
>  	if (!buf)
>  		die("git-cat-file %s: bad file", obj_name);
>  
> +	if (print_size) {
> +		printf("%lu\n", size);
> +		fflush(stdout);
> +	}
>  	write_or_die(1, buf, size);
> +	if (print_size) {
> +		printf("\n");
> +		fflush(stdout);
> +	}
>  	return 0;
>  }
>  

Not that I object strongly to it, but do we need extra LF after
the contents?

  - "It would help readers written in typical scripting
    languages" is an acceptable answer, but I doubt that is the
    case --- the reader is given the number of bytes and is
    going to "read($pipe, $buf, $that_size)" anyway.

  - "The reader can assert that one-byte past the content is a
    LF to catch errors, and this LF would help re-synchronize
    after such an error" would be another acceptable answer, but
    for the re-synchronization to work, the output needs to tell
    which record each chunk is about (i.e. if the output were
    "<type> <sha1> <size>LF<contents>LF", the "re-sync" argument
    would make a bit more sense).

> +	print_size = !opt || opt == 'p';

Needs a bit of comment here, and in the documentation.  E.g.

	git-cat-file --stdin -t <list-of-sha1
        git-cat-file --stdin -s <list-of-sha1

	are ways to check types and sizes of the objects in the
	list.

How does --stdin interact with -e?

How does --stdin interact with -p when printing a tree or a tag
object?

How does "blob --stdin" do when input sequence contains a non
blob SHA1?

It almost feels that --stdin should be named something else,
such as --batch or --bulk, as it is not just affecting the
input.

Here is an alternative suggestion.

   Two new options, --batch and --batch-check, are introduced.
   These options are incompatible with -[tsep] or an object type
   given as the first parameter to git-cat-file.

   * git-cat-file --batch-check <list-of-sha1

     outputs a record of this form

          <sha1> SP <type> SP <size> LF

     for each of the input lines.

   * git-cat-file --batch <list-of-sha1

     outputs a record of this form

          <sha1> SP <type> SP <size> LF <contents> LF

     for each of the input lines.

  For a missing object, either option gives a record of form:

          <sha1> SP missing LF

^ permalink raw reply

* Re: [PATCH 3/9] git-cat-file: Make option parsing a little more flexible
From: Junio C Hamano @ 2007-10-26 20:56 UTC (permalink / raw)
  To: Adam Roben; +Cc: git
In-Reply-To: <1193307927-3592-4-git-send-email-aroben@apple.com>

Adam Roben <aroben@apple.com> writes:

> This will make it easier to add newer options later.

A good change in principle.

> diff --git a/builtin-cat-file.c b/builtin-cat-file.c
> index 34a63d1..3a0be4a 100644
> --- a/builtin-cat-file.c
> +++ b/builtin-cat-file.c
> @@ -143,23 +143,41 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
>  	return 0;
>  }
>  
> +static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>";
> +
>  int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  {
> -	int opt;
> -	const char *exp_type, *obj_name;
> +	int i, opt = 0;
> +	const char *exp_type = 0, *obj_name = 0;

NULL pointer constants in git sources are spelled "NULL", not
"0".

^ permalink raw reply

* Re: best git practices, was Re: Git User's Survey 2007 unfinishedsummary continued
From: David Kastrup @ 2007-10-26 20:01 UTC (permalink / raw)
  To: Federico Mena Quintero; +Cc: Theodore Tso, git
In-Reply-To: <1193335339.4522.398.camel@cacharro.xalalinux.org>

Federico Mena Quintero <federico@novell.com> writes:

> You don't find quantum physicists saying, "... yeah, like Newton's
> brain-damaged followers" :)

Oh, one does get this sort of comments.  Predominantly from those who
understand neither theory.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: git-svnimport
From: Junio C Hamano @ 2007-10-26 19:30 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Johannes Schindelin, Gerrit Pape, git
In-Reply-To: <47222CB3.9070100@midwinter.com>

Steven Grimm <koreth@midwinter.com> writes:

> Johannes Schindelin wrote:
>>> I'm not sure these are worth fixing, I'd rather suggest to drop
>>> git-svnimport in favor of git-svn, and not installing it in future
>>> versions.
>>>
>> I already proposed this.  The outcome was... silent.
>>
>
> Me too, and same reaction.
>
> So I vote we interpret that as, "No objections from anyone."

Likewise.

^ permalink raw reply

* [StGit RFC] A more structured way of calling git
From: Karl Hasselström @ 2007-10-26 19:24 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: David Kågedal, Git Mailing List, Yann Dirson

I wanted to build an StGit command that coalesced adjacent patches to
a single patch. Because the end result tree would still be the same,
this should be doable without ever involving HEAD, the index, or the
worktree. StGit's existing infrastructure for manipulating patches
didn't lend itself to doing this kind of thing, though: it's not
modular enough. So I started to design a replacement low-level
interface to git, and things got slightly out of hand ... and I ended
up with a much bigger refactoring than I'd planned.

It's all outlined below, and all the code I currently have is
attached. Unless there's opposition, my plan is to convert one command
at a time to use the new infrastructure -- this can be done since the
on-disk format is unaffected.

Comments?


Python wrapping of git objects (gitlib.py)
----------------------------------------------------------------------

To make it easier to work with git objects, I've built a more
high-level interface than just calling git commands directly. Some of
it is trivial:

  * Blobs and tags aren't covered (yet), since StGit never uses them.
    I think. If they are needed in the future, they can easily be
    wrapped.

  * Trees are represented by the Python class Tree. They are
    immutable, and have only one property: the sha1 of the tree. More
    could be added if we need to look inside trees.

The interesting case is for commit objects:

  * Commits are represented by the Python class Commit. They are
    immutable, and have two properties: sha1 and commit data.

  * Commit data is represented by the Python class CommitData. These
    objects are also immutable, and have properties for author,
    committer, commit message, tree, and list of parent commits. They
    also have setter functions for these properties, which (since
    CommitData objects are immutable) return a modified copy of the
    original object.

  * Author and committer are represented by a Person class, also
    immutable with setter functions.

The user may create new CommitData objects, but never creates Tree or
Commit objects herself. Instead, she asks her Repository object to
create them for her:

  * Repository.get_tree and Repository.get_commit take sha1
    parameters, and returns the corresponding objects. They must
    already exist.

  * Repository.commit takes a CommitData parameter, and returns a
    corresponding Commit object representing the new commit.
    Internally, it runs git-commit-tree.

This has the nice property that Tree and Commit objects always
represent objects that git knows about. It also makes it trivial to
create new commits that have arbitrary existing commits as parents and
an existing tree. For example, my coalesce-adjacent-patches command
could be built on top of this.

The Repository object also has methods for reading and writing (and
caching) refs, but it lacks any method for creating new trees. This is
the job of the Index object. It has read_tree and write_tree methods
for getting Trees in to and out of the current index state. It also
has a merge method that does a 3-way merge between arbitrary Trees,
without ever touching the worktree (if the merge cannot be resolved
automatically, it simply fails).

The user is free to create as many Repository and Index objects as she
wants; their constructors take a git repository path and an index file
path as argumentes, respectively. This means that it's very easy to
work with a temporary index, which is neat in combination with the
Index.merge method: it lets you merge three trees to create a fourth
without ever touching the worktree or the default index.

For operations that involve a worktree as well, we have the
IndexAndWorktree class. It has methods for e.g. checkout and
update-index; this is also where a full, possibly conflicting merge
will go when I get around to implementing it.


Low-level StGit on top of the git wrappers (stacklib.py)
----------------------------------------------------------------------

That was all about git. We need an StGit layer on top of it.

There's a Stack object that represents a branch. It has two important
properties:

  * A PatchOrder object. This keeps track of the list of applied and
    unapplied patches, by name.

  * A store of Patch objects. This can look up Patch objects by name,
    and create new patches.

Patch objects represent patches, and are very simple. Basically the
only thing you can do with them is get their commit object, set their
commit object, and delete them. Author, commit message, top and
bottom, and all those things aren't a property of the patch; they are
properties of its commit.

(In the future, Patch objects should write stuff to the patch log.
They could also during a gradual transition to this new infrastructure
write out the per-patch metadata files that StGit currently uses.)

Importantly, unlike the current StGit stack class, there are no
high-level stack operations à la push and pop here. This is all
low-level manipulation of patch refs and the applied/unapplied files.
But in combination with the stuff in gitlib.py, lots of higher-level
StGit operations can be built on top of this.


Transactions (translib.py)
----------------------------------------------------------------------

I started to implement a few StGit commands on top of gitlib.py and
stacklib.py, and then realized something very appealing:

  Just about every StGit command can be accomplished by first creating
  a bunch of new commit objects with gitlib.py, then trying to check
  out the new HEAD, and then rewriting refs with stacklib.py. Only the
  first and second steps can possibly fail, and if they do, they do so
  without leaving any user-visible junk behind. This can be used to
  make all commands either succeed completely, or do nothing at all.

As an example (which I've not yet implemented), consider how push
would work:

  1. Create the new commit objects that the patches to be pushed will
     use. For each patch:

       a. Check if it's a fast forward. If so, just reuse the old
          commit object.

       b. Try the in-index merge with a temp index. If it succeeds,
          create a new commit object with that tree.

       c. Otherwise, stop trying to push any more patches.

  2. Check out the new HEAD tree. This may fail if the worktree and/or
     index contain conflicting changes. If so, we just abort the whole
     operation and tell the user which files she needs to clean up.

       a. If we had a patch that we couldn't push in (1.c), and then
          do a full 3-way merge with its original tree. This may fail
          if the worktree and/or index is dirty; if so, we don't try
          to push that patch.

       b. If the merge succeeds but with conflicts, create a new
          commit for it with the same tree as its parent (i.e. an
          empty commit) and leave the conflicts for the user to
          resolve.

       c. Otherwise, the merge autoresolved. Go back to (1) and try to
          push the remaining patches too. But remember that if we
          later need to abort the push due to dirty worktree/index, we
          have already pushed a few of the patches.

  3. Use stacklib.py to rewrite the branch ref and the patch refs.

This will end up pushing some subset of the requested patches. The
only way we'll ever get a result that isn't all-or-nothing is if a
merge conflicts. Note also how (except for the irritating (2.c)) we
never touch the index and worktree until we're already done, which
should make things both robust and fast.

(Step (2.c) is irritating, in that we actually have to check out a new
tree in order to use merge-recursive, and merge-recursive might
autoresolve a merge that the in-index merge failed to resolve, so that
we have checked out an intermediate tree even though there didn't end
up being any conflict for the user to resolve.)

The code in translib.py is a simple class that can hold a record of
everything that needs to be done in step (3), and then does it when
and if we get there.

The killer feature of transactions (apart from their use as a utility
when writing commands) is that we could build transaction logging.
Since every StGit command performs exatly one transaction, if we
simply logged the before and after values of the patch refs, branch
ref, and patch appliedness, we could build a generic StGit undo/redo
command.


Example commands (utillib.py)
----------------------------------------------------------------------

This file has sample implementations of some StGit commands: clean,
pop, push, and refresh. They don't have any bells and whistles, and
the push is fundamentally limited in that it doesn't handle conflicts
-- it'll complain and do nothing.

These were mostly done to excercise the new infrastructure and make
sure that I hadn't forgotten anything. The plan is not to replace the
existing commands, just make them use the new infrastructure.




diff --git a/stgit/gitlib.py b/stgit/gitlib.py
new file mode 100644
index 0000000..46911d5
--- /dev/null
+++ b/stgit/gitlib.py
@@ -0,0 +1,360 @@
+import os, os.path, re
+from exception import *
+import run
+
+class DetachedHeadException(StgException):
+    pass
+
+class Repr(object):
+    def __repr__(self):
+        return str(self)
+
+class NoValue(object):
+    pass
+
+def make_defaults(defaults):
+    def d(val, attr):
+        if val != NoValue:
+            return val
+        elif defaults != NoValue:
+            return getattr(defaults, attr)
+        else:
+            return None
+    return d
+
+class Person(Repr):
+    """Immutable."""
+    def __init__(self, name = NoValue, email = NoValue,
+                 date = NoValue, defaults = NoValue):
+        d = make_defaults(defaults)
+        self.__name = d(name, 'name')
+        self.__email = d(email, 'email')
+        self.__date = d(date, 'date')
+    name = property(lambda self: self.__name)
+    email = property(lambda self: self.__email)
+    date = property(lambda self: self.__date)
+    def set_name(self, name):
+        return type(self)(name = name, defaults = self)
+    def set_email(self, email):
+        return type(self)(email = email, defaults = self)
+    def set_date(self, date):
+        return type(self)(date = date, defaults = self)
+    def __str__(self):
+        return '%s <%s> %s' % (self.name, self.email, self.date)
+    @classmethod
+    def parse(cls, s):
+        m = re.match(r'^([^<]*)<([^>]*)>\s+(\d+\s+[+-]\d{4})$', s)
+        assert m
+        name = m.group(1).strip()
+        email = m.group(2)
+        date = m.group(3)
+        return cls(name, email, date)
+
+class Tree(Repr):
+    """Immutable."""
+    def __init__(self, sha1):
+        self.__sha1 = sha1
+    sha1 = property(lambda self: self.__sha1)
+    def __str__(self):
+        return 'Tree<%s>' % self.sha1
+
+class Commitdata(Repr):
+    """Immutable."""
+    def __init__(self, tree = NoValue, parents = NoValue, author = NoValue,
+                 committer = NoValue, message = NoValue, defaults = NoValue):
+        d = make_defaults(defaults)
+        self.__tree = d(tree, 'tree')
+        self.__parents = d(parents, 'parents')
+        self.__author = d(author, 'author')
+        self.__committer = d(committer, 'committer')
+        self.__message = d(message, 'message')
+    tree = property(lambda self: self.__tree)
+    parents = property(lambda self: self.__parents)
+    @property
+    def parent(self):
+        assert len(self.__parents) == 1
+        return self.__parents[0]
+    author = property(lambda self: self.__author)
+    committer = property(lambda self: self.__committer)
+    message = property(lambda self: self.__message)
+    def set_tree(self, tree):
+        return type(self)(tree = tree, defaults = self)
+    def set_parents(self, parents):
+        return type(self)(parents = parents, defaults = self)
+    def add_parent(self, parent):
+        return type(self)(parents = list(self.parents or []) + [parent],
+                          defaults = self)
+    def set_parent(self, parent):
+        return self.set_parents([parent])
+    def set_author(self, author):
+        return type(self)(author = author, defaults = self)
+    def set_committer(self, committer):
+        return type(self)(committer = committer, defaults = self)
+    def set_message(self, message):
+        return type(self)(message = message, defaults = self)
+    def __str__(self):
+        if self.tree == None:
+            tree = None
+        else:
+            tree = self.tree.sha1
+        if self.parents == None:
+            parents = None
+        else:
+            parents = [p.sha1 for p in self.parents]
+        return ('Commitdata<tree: %s, parents: %s, author: %s,'
+                ' committer: %s, message: "%s">'
+                ) % (tree, parents, self.author, self.committer, self.message)
+    @classmethod
+    def parse(cls, repository, s):
+        cd = cls()
+        lines = list(s.splitlines(True))
+        for i in xrange(len(lines)):
+            line = lines[i].strip()
+            if not line:
+                return cd.set_message(''.join(lines[i+1:]))
+            key, value = line.split(None, 1)
+            if key == 'tree':
+                cd = cd.set_tree(repository.get_tree(value))
+            elif key == 'parent':
+                cd = cd.add_parent(repository.get_commit(value))
+            elif key == 'author':
+                cd = cd.set_author(Person.parse(value))
+            elif key == 'committer':
+                cd = cd.set_committer(Person.parse(value))
+            else:
+                assert False
+        assert False
+
+class Commit(Repr):
+    """Immutable."""
+    def __init__(self, repository, sha1):
+        self.__sha1 = sha1
+        self.__repository = repository
+        self.__data = None
+    sha1 = property(lambda self: self.__sha1)
+    @property
+    def data(self):
+        if self.__data == None:
+            self.__data = Commitdata.parse(
+                self.__repository,
+                self.__repository.cat_object(self.sha1))
+        return self.__data
+    def __str__(self):
+        return 'Commit<sha1: %s, data: %s>' % (self.sha1, self.__data)
+
+class Refs(object):
+    def __init__(self, repository):
+        self.__repository = repository
+        self.__refs = None
+    def __cache_refs(self):
+        self.__refs = {}
+        for line in self.__repository.run(['git-show-ref']).output_lines():
+            m = re.match(r'^([0-9a-f]{40})\s+(\S+)$', line)
+            sha1, ref = m.groups()
+            self.__refs[ref] = sha1
+    def get(self, ref):
+        if self.__refs == None:
+            self.__cache_refs()
+        return self.__repository.get_commit(self.__refs[ref])
+    def set(self, ref, commit, msg):
+        if self.__refs == None:
+            self.__cache_refs()
+        old_sha1 = self.__refs.get(ref, '0'*40)
+        new_sha1 = commit.sha1
+        if old_sha1 != new_sha1:
+            self.__repository.run(['git-update-ref', '-m', msg,
+                                   ref, new_sha1, old_sha1]).no_output()
+            self.__refs[ref] = new_sha1
+    def delete(self, ref):
+        if self.__refs == None:
+            self.__cache_refs()
+        self.__repository.run(['git-update-ref',
+                               '-d', ref, self.__refs[ref]]).no_output()
+        del self.__refs[ref]
+
+class ObjectCache(object):
+    def __init__(self, create):
+        self.__objects = {}
+        self.__create = create
+    def __getitem__(self, name):
+        if not name in self.__objects:
+            self.__objects[name] = self.__create(name)
+        return self.__objects[name]
+    def __contains__(self, name):
+        return name in self.__objects
+    def __setitem__(self, name, val):
+        assert not name in self.__objects
+        self.__objects[name] = val
+
+def add_dict(d1, d2):
+    d = dict(d1)
+    d.update(d2)
+    return d
+
+class RunWithEnv(object):
+    def run(self, args, env = {}):
+        return run.Run(*args).env(add_dict(self.env, env))
+
+class Repository(RunWithEnv):
+    def __init__(self, directory):
+        self.__git_dir = directory
+        self.__refs = Refs(self)
+        self.__trees = ObjectCache(lambda sha1: Tree(sha1))
+        self.__commits = ObjectCache(lambda sha1: Commit(self, sha1))
+    env = property(lambda self: { 'GIT_DIR': self.__git_dir })
+    @classmethod
+    def default(cls):
+        """Return the default repository."""
+        return cls(run.Run('git-rev-parse', '--git-dir').output_one_line())
+    def default_index(self):
+        return Index(self, (os.environ.get('GIT_INDEX_FILE', None)
+                            or os.path.join(self.__git_dir, 'index')))
+    def temp_index(self):
+        return Index(self, self.__git_dir)
+    def default_worktree(self):
+        path = os.environ.get('GIT_WORK_TREE', None)
+        if not path:
+            o = run.Run('git-rev-parse', '--show-cdup').output_lines()
+            o = o or ['.']
+            assert len(o) == 1
+            path = o[0]
+        return Worktree(path)
+    def default_iw(self):
+        return IndexAndWorktree(self.default_index(), self.default_worktree())
+    directory = property(lambda self: self.__git_dir)
+    refs = property(lambda self: self.__refs)
+    def cat_object(self, sha1):
+        return self.run(['git-cat-file', '-p', sha1]).raw_output()
+    def get_tree(self, sha1):
+        return self.__trees[sha1]
+    def get_commit(self, sha1):
+        return self.__commits[sha1]
+    def commit(self, commitdata):
+        c = ['git-commit-tree', commitdata.tree.sha1]
+        for p in commitdata.parents:
+            c.append('-p')
+            c.append(p.sha1)
+        env = {}
+        for p, v1 in ((commitdata.author, 'AUTHOR'),
+                       (commitdata.committer, 'COMMITTER')):
+            if p != None:
+                for attr, v2 in (('name', 'NAME'), ('email', 'EMAIL'),
+                                 ('date', 'DATE')):
+                    if getattr(p, attr) != None:
+                        env['GIT_%s_%s' % (v1, v2)] = getattr(p, attr)
+        sha1 = self.run(c, env = env).raw_input(commitdata.message
+                                                ).output_one_line()
+        return self.get_commit(sha1)
+    @property
+    def head(self):
+        try:
+            return self.run(['git-symbolic-ref', '-q', 'HEAD']
+                            ).output_one_line()
+        except run.RunException:
+            raise DetachedHeadException()
+    def set_head(self, ref, msg):
+        self.run(['git-symbolic-ref', '-m', msg, 'HEAD', ref]).no_output()
+    @property
+    def head_commit(self):
+        return self.get_commit(self.run(['git-rev-parse', 'HEAD']
+                                        ).output_one_line())
+
+class MergeException(StgException):
+    pass
+
+class Index(RunWithEnv):
+    def __init__(self, repository, filename):
+        self.__repository = repository
+        if os.path.isdir(filename):
+            # Create a temp index in the given directory.
+            self.__filename = os.path.join(
+                filename, 'index.temp-%d-%x' % (os.getpid(), id(self)))
+            self.delete()
+        else:
+            self.__filename = filename
+    env = property(lambda self: add_dict(self.__repository.env,
+                                         { 'GIT_INDEX_FILE': self.__filename }))
+    def read_tree(self, tree):
+        self.run(['git-read-tree', tree.sha1]).no_output()
+    def write_tree(self):
+        return self.__repository.get_tree(
+            self.run(['git-write-tree']).output_one_line())
+    def is_clean(self):
+        try:
+            self.run(['git-update-index', '--refresh']).discard_output()
+        except run.RunException:
+            return False
+        else:
+            return True
+    def merge(self, base, ours, theirs):
+        """In-index merge, no worktree involved."""
+        self.run(['git-read-tree', '-m', '-i', '--aggressive',
+                  base.sha1, ours.sha1, theirs.sha1]).no_output()
+        try:
+            self.run(['git-merge-index', 'git-merge-one-file', '-a']
+                     ).no_output()
+        except run.RunException:
+            raise MergeException('In-index merge failed due to conflicts')
+    def delete(self):
+        if os.path.isfile(self.__filename):
+            os.remove(self.__filename)
+
+class Worktree(object):
+    def __init__(self, directory):
+        self.__directory = directory
+    env = property(lambda self: { 'GIT_WORK_TREE': self.__directory })
+
+class CheckoutException(StgException):
+    pass
+
+class IndexAndWorktree(RunWithEnv):
+    def __init__(self, index, worktree):
+        self.__index = index
+        self.__worktree = worktree
+    index = property(lambda self: self.__index)
+    env = property(lambda self: add_dict(self.__index.env, self.__worktree.env))
+    def checkout(self, old_commit, new_commit):
+        # TODO: Optionally do a 3-way instead of doing nothing when we
+        # have a problem. Or maybe we should stash changes in a patch?
+        try:
+            self.run(['git-read-tree', '-u', '-m',
+                      '--exclude-per-directory=.gitignore',
+                      old_commit.sha1, new_commit.sha1]
+                     ).discard_output()
+        except run.RunException:
+            raise CheckoutException('Index/workdir dirty')
+    def changed_files(self):
+        return self.run(['git-diff-files', '--name-only']).output_lines()
+    def update_index(self, files):
+        self.run(['git-update-index', '--remove', '-z', '--stdin']
+                 ).input_nulterm(files).discard_output()
+
+if __name__ == '__main__':
+    testdir = '/tmp/stgtest'
+    os.system('rm -rf %s' % testdir)
+    os.makedirs(testdir)
+    os.chdir(testdir)
+    for c in ['git init',
+              'echo foo >> foo',
+              'git add foo',
+              'git commit -m foo',
+              'echo bar >> foo',
+              'git commit -a -m foo']:
+        os.system(c)
+    r = Repository(os.path.join(testdir, '.git'))
+    head = r.head
+    c = r.refs.get(head)
+    print 'HEAD is', head, 'which is', c
+    c.data
+    print 'Expanded:', c
+    maja = Person(name = 'Maja', email = 'maja@example.com')
+    nisse = Person(name = 'Nisse', email = 'nisse@example.com')
+    c2 = r.commit(c.data.set_parents([c]).set_author(maja))
+    c3 = r.commit(c.data.set_parents([c]).set_author(nisse))
+    c4 = r.commit(c.data.set_parents([c2, c3]))
+    r.refs.set(head, c4, 'made a cool merge')
+    c5 = r.commit(c.data.set_parents([c4]).set_tree(
+        c.data.parents[0].data.tree))
+    head = 'refs/heads/foobar'
+    r.refs.set(head, c5, 'committed a revert')
+    r.set_head(head, 'switched to other branch')
diff --git a/stgit/run.py b/stgit/run.py
index 924e59a..43c3a23 100644
--- a/stgit/run.py
+++ b/stgit/run.py
@@ -105,7 +105,7 @@ class Run:
     def input_lines(self, lines):
         self.__indata = ''.join(['%s\n' % line for line in lines])
         return self
-    def input_nulterm(self, items):
+    def input_nulterm(self, lines):
         self.__indata = ''.join('%s\0' % line for line in lines)
         return self
     def no_output(self):
diff --git a/stgit/stacklib.py b/stgit/stacklib.py
new file mode 100644
index 0000000..06ba007
--- /dev/null
+++ b/stgit/stacklib.py
@@ -0,0 +1,120 @@
+import os.path
+import gitlib, utils
+
+class Patch(object):
+    def __init__(self, stack, name):
+        self.__stack = stack
+        self.__name = name
+    name = property(lambda self: self.__name)
+    def __ref(self):
+        return 'refs/patches/%s/%s' % (self.__stack.name, self.__name)
+    @property
+    def commit(self):
+        return self.__stack.repository.refs.get(self.__ref())
+    def set_commit(self, commit, msg):
+        self.__stack.repository.refs.set(self.__ref(), commit, msg)
+    def delete(self):
+        self.__stack.repository.refs.delete(self.__ref())
+    def is_applied(self):
+        return self.name in self.__stack.patchorder.applied
+    def is_empty(self):
+        c = self.commit
+        return c.data.tree == c.data.parent.data.tree
+
+class PatchOrder(object):
+    """Keeps track of patch order, and which patches are applied.
+    Works with patch names, not actual patches."""
+    __list_order = [ 'applied', 'unapplied' ]
+    def __init__(self, stack):
+        self.__stack = stack
+        self.__lists = {}
+    def __read_file(self, fn):
+        return tuple(utils.read_strings(
+            os.path.join(self.__stack.directory, fn)))
+    def __write_file(self, fn, val):
+        utils.write_strings(os.path.join(self.__stack.directory, fn), val)
+    def __get_list(self, name):
+        if not name in self.__lists:
+            self.__lists[name] = self.__read_file(name)
+        return self.__lists[name]
+    def __set_list(self, name, val):
+        val = tuple(val)
+        if val != self.__lists.get(name, None):
+            self.__lists[name] = val
+            self.__write_file(name, val)
+    applied = property(lambda self: self.__get_list('applied'),
+                       lambda self, val: self.__set_list('applied', val))
+    unapplied = property(lambda self: self.__get_list('unapplied'),
+                         lambda self, val: self.__set_list('unapplied', val))
+
+class Patches(object):
+    """Creates Patch objects."""
+    def __init__(self, stack):
+        self.__stack = stack
+        def create_patch(name):
+            p = Patch(self.__stack, name)
+            p.commit # raise exception if the patch doesn't exist
+            return p
+        self.__patches = gitlib.ObjectCache(create_patch) # name -> Patch
+    def exists(self, name):
+        return name in self.__patches
+    def get(self, name):
+        return self.__patches[name]
+    def new(self, name, commit, msg):
+        assert not name in self.__patches
+        p = Patch(self.__stack, name)
+        p.set_commit(commit, msg)
+        self.__patches[name] = p
+        return p
+
+class Stack(object):
+    def __init__(self, repository, name):
+        self.__repository = repository
+        self.__name = name
+        self.__patchorder = PatchOrder(self)
+        self.__patches = Patches(self)
+    name = property(lambda self: self.__name)
+    repository = property(lambda self: self.__repository)
+    patchorder = property(lambda self: self.__patchorder)
+    patches = property(lambda self: self.__patches)
+    @property
+    def directory(self):
+        return os.path.join(self.__repository.directory, 'patches', self.__name)
+    def __ref(self):
+        return 'refs/heads/%s' % self.__name
+    @property
+    def head(self):
+        return self.__repository.refs.get(self.__ref())
+    def set_head(self, commit, msg):
+        self.__repository.refs.set(self.__ref(), commit, msg)
+    @property
+    def base(self):
+        if self.patchorder.applied:
+            return self.patches.get(self.patchorder.applied[0]
+                                    ).commit.data.parent
+        else:
+            return self.head
+
+def strip_leading(prefix, s):
+    assert s.startswith(prefix)
+    return s[len(prefix):]
+
+class Repository(gitlib.Repository):
+    def __init__(self, *args, **kwargs):
+        gitlib.Repository.__init__(self, *args, **kwargs)
+        self.__stacks = {} # name -> Stack
+    @property
+    def current_branch(self):
+        return strip_leading('refs/heads/', self.head)
+    @property
+    def current_stack(self):
+        return self.get_stack(self.current_branch)
+    def get_stack(self, name):
+        if not name in self.__stacks:
+            if name == None:
+                s = None # detached HEAD
+            else:
+                # TODO: verify that the branch exists
+                s = Stack(self, name)
+            self.__stacks[name] = s
+        return self.__stacks[name]
diff --git a/stgit/translib.py b/stgit/translib.py
new file mode 100644
index 0000000..deb3420
--- /dev/null
+++ b/stgit/translib.py
@@ -0,0 +1,70 @@
+import gitlib
+from exception import *
+from out import *
+
+class TransactionException(StgException):
+    pass
+
+class StackTransaction(object):
+    def __init__(self, stack, msg):
+        self.__stack = stack
+        self.__msg = msg
+        self.__patches = {}
+        self.__applied = list(self.__stack.patchorder.applied)
+        self.__unapplied = list(self.__stack.patchorder.unapplied)
+    def __set_patches(self, val):
+        self.__patches = dict(val)
+    patches = property(lambda self: self.__patches, __set_patches)
+    def __set_applied(self, val):
+        self.__applied = list(val)
+    applied = property(lambda self: self.__applied, __set_applied)
+    def __set_unapplied(self, val):
+        self.__unapplied = list(val)
+    unapplied = property(lambda self: self.__unapplied, __set_unapplied)
+    def __check_consistency(self):
+        remaining = set(self.__applied + self.__unapplied)
+        for pn, commit in self.__patches.iteritems():
+            if commit == None:
+                assert self.__stack.patches.exists(pn)
+            else:
+                assert pn in remaining
+    def run(self, iw = None):
+        self.__check_consistency()
+
+        # Get new head commit.
+        if self.__applied:
+            top_patch = self.__applied[-1]
+            try:
+                new_head = self.__patches[top_patch]
+            except KeyError:
+                new_head = self.__stack.patches.get(top_patch).commit
+        else:
+            new_head = self.__stack.base
+
+        # Set branch head.
+        if new_head == self.__stack.head:
+            out.info('Head remains at %s' % new_head.sha1[:8])
+        elif new_head.data.tree == self.__stack.head.data.tree:
+            out.info('Head %s -> %s (same tree)' % (self.__stack.head.sha1[:8],
+                                                    new_head.sha1[:8]))
+        elif iw != None:
+            try:
+                iw.checkout(self.__stack.head, new_head)
+            except gitlib.CheckoutException, e:
+                raise TransactionException(e)
+            out.info('Head %s -> %s' % (self.__stack.head.sha1[:8],
+                                        new_head.sha1[:8]))
+        self.__stack.set_head(new_head, self.__msg)
+
+        # Write patches.
+        for pn, commit in self.__patches.iteritems():
+            if self.__stack.patches.exists(pn):
+                p = self.__stack.patches.get(pn)
+                if commit == None:
+                    p.delete()
+                else:
+                    p.set_commit(commit, self.__msg)
+            else:
+                self.__stack.patches.new(pn, commit, self.__msg)
+        self.__stack.patchorder.applied = self.__applied
+        self.__stack.patchorder.unapplied = self.__unapplied
diff --git a/stgit/utillib.py b/stgit/utillib.py
new file mode 100644
index 0000000..d09ecc4
--- /dev/null
+++ b/stgit/utillib.py
@@ -0,0 +1,139 @@
+import gitlib, translib
+from out import *
+
+def head_top_equal(repository):
+    head = repository.head_commit
+    try:
+        s = repository.current_stack
+    except gitlib.DetachedHeadException:
+        out.error('Not on any branch (detached HEAD)')
+        return False
+    applied = s.patchorder.applied
+    if not applied:
+        return True
+    top = s.patches.get(applied[-1])
+    if top.commit == head:
+        return True
+    out.error('The top patch (%s, %s)' % (top.name, top.commit.sha1),
+              'and HEAD (%s) are not the same.' % head.sha1)
+    return False
+
+def simple_merge(repository, base, ours, theirs):
+    """Given three trees, tries to do an in-index merge in a temporary
+    index with a temporary index. Returns the result tree, or None if
+    the merge failed (due to conflicts)."""
+    assert isinstance(base, gitlib.Tree)
+    assert isinstance(ours, gitlib.Tree)
+    assert isinstance(theirs, gitlib.Tree)
+    if base == ours:
+        # Fast forward: theirs is a descendant of ours.
+        return theirs
+    if base == theirs:
+        # Fast forward: ours is a descendant of theirs.
+        return ours
+    index = repository.temp_index()
+    try:
+        try:
+            index.merge(base, ours, theirs)
+        except gitlib.MergeException:
+            return None
+        return index.write_tree()
+    finally:
+        index.delete()
+
+def clean(stack):
+    t = translib.StackTransaction(stack, 'stg clean')
+    t.unapplied = []
+    for pn in stack.patchorder.unapplied:
+        p = stack.patches.get(pn)
+        if p.is_empty():
+            t.patches[pn] = None
+        else:
+            t.unapplied.append[pn]
+    t.applied = []
+    parent = stack.base
+    for pn in stack.patchorder.applied:
+        p = stack.patches.get(pn)
+        if p.is_empty():
+            t.patches[pn] = None
+            out.info('Deleting %s' % pn)
+        else:
+            if parent != p.commit.data.parent:
+                parent = t.patches[pn] = stack.repository.commit(
+                    p.commit.data.set_parent(parent))
+            else:
+                parent = p.commit
+            t.applied.append(pn)
+            out.info('Keeping %s' % pn)
+    t.run()
+
+def pop(stack, iw = None):
+    t = translib.StackTransaction(stack, 'stg pop')
+    pn = t.applied.pop()
+    t.unapplied.insert(0, pn)
+    t.run(iw)
+
+def push(stack, pn, iw = None):
+    t = translib.StackTransaction(stack, 'stg push')
+    t.unapplied.remove(pn)
+    t.applied.append(pn)
+    p = stack.patches.get(pn)
+    if stack.head != p.commit.data.parent:
+        tree = simple_merge(stack.repository, p.commit.data.parent.data.tree,
+                            stack.head.data.tree, p.commit.data.tree)
+        assert tree
+        t.patches[pn] = stack.repository.commit(
+            p.commit.data.set_parent(stack.head).set_tree(tree))
+    t.run(iw)
+
+def refresh(stack, iw):
+    iw.update_index(iw.changed_files())
+    tree = iw.index.write_tree()
+    t = translib.StackTransaction(stack, 'stg refresh')
+    p = stack.patches.get(t.applied[-1])
+    t.patches[p.name] = stack.repository.commit(
+        p.commit.data.set_tree(tree))
+    t.run()
+
+if __name__ == '__main__':
+    import os
+    import stacklib
+    testdir = '/tmp/stgtest'
+    os.system('rm -rf %s' % testdir)
+    os.makedirs(testdir)
+    os.chdir(testdir)
+    for c in ['git init',
+              'echo foo >> foo',
+              'git add foo',
+              'git commit -m foo',
+              'stg init']:
+        os.system(c)
+    for i in range(3):
+        for c in ['stg new p%d -m p%d' % (i, i),
+                  'echo %s >> foo%d' % (str(i)*4, i),
+                  'git add foo%d' % i,
+                  'stg refresh',
+                  'stg new q%d -m q%d' % (i, i)]:
+            os.system(c)
+    r = stacklib.Repository.default()
+    print 'Current branch:', r.current_branch
+    s = r.current_stack
+    print 'Applied:', s.patchorder.applied
+    print 'Unapplied:', s.patchorder.unapplied
+    os.system('git checkout HEAD^')
+    head_top_equal(r)
+    os.system('git checkout master')
+    head_top_equal(r)
+    clean(s)
+    iw = r.default_iw()
+    pop(s, iw)
+    pop(s, iw)
+    os.system('stg series')
+    os.system('stg status')
+    push(s, 'p2', iw)
+    os.system('stg series')
+    os.system('stg status')
+    os.system('echo 333 >> foo0')
+    refresh(s, iw)
+    os.system('stg series')
+    os.system('stg status')

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply related

* Re: How to remove a specific hunk
From: Alex Riesen @ 2007-10-26 19:20 UTC (permalink / raw)
  To: Pascal Obry; +Cc: git
In-Reply-To: <4722036E.5030204@wanadoo.fr>

Pascal Obry, Fri, Oct 26, 2007 17:10:38 +0200:
> 
> I'm very new to Git... but start to love it :)
> 
> Before committing sometimes I want to remove a specific hunk. Say in
> file a.txt I have in the diff 3 hunks, I want to revert/delete/remove
> the second one. Is there a way to do that ?

Take a look at git-gui. Try right-clicking in the diff pane at the
bottom.

^ permalink raw reply

* Re: How to remove a specific hunk
From: Johannes Schindelin @ 2007-10-26 18:19 UTC (permalink / raw)
  To: Benoit SIGOURE; +Cc: Pascal Obry, git list
In-Reply-To: <2669F76D-6FF2-4CCF-9337-639D84EE65E8@lrde.epita.fr>

Hi,

On Fri, 26 Oct 2007, Benoit SIGOURE wrote:

> On Oct 26, 2007, at 5:38 PM, Pascal Obry wrote:
> 
> > Andreas,
> > 
> > > Once you've added the other two hunks, they'll no longer show up in
> > > git-diff, so you can do something like this:
> > > 
> > > $ git-add -i; # add the other two hunks to commit
> > > $ git-diff > middle-hunk.patch
> > > $ git-apply -R middle-hunk.patch
> > > test, test, test
> > > $ git-apply middle-hunk.patch
> > 
> > Thanks, this will clearly work. I was expecting something more
> > integrated like a "git reset --interactive" or something like that :)
> 
> That'd be great!  :)

I skipped over the beginnings of this thread because of time constraints, 
but would "git reset HEAD^ && git add -i" not helped you?  git add -i 
allows you to stage hunks, so by just _not_ staging _that_ hunk but 
everything else, should have worked for you, right?

There's also git-gui which does all that graphically for you (remember the 
right mouse button).

Ciao,
Dscho

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox