Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Petr Baudis @ 2005-05-18 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds
In-Reply-To: <7vzmutqz5f.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Wed, May 18, 2005 at 12:13:32AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> PB> But that won't work good enough for me. E.g. when committing in a
> PB> subdirectory, I want to commit only changes made in the subdirectory,
> PB> etc.
> 
> Assuming that you have something that lets you commit selected
> files when you are at the top level (say cg-commit), and further
> assuming that today it only works from the toplevel, that is:
> 
>     $ pwd
>     /usr/src/linux
>     $ cg-commit fs/ext?/Makefile
> 
> works today, what I am saying is:
> 
>     $ pwd
>     /usr/src/linux/fs
>     $ git-run-with-user-path cg-commit -- ext?/Makefile
> 
> would work.

Yes. But if you do just cg-commit in the subdirectory, it won't work.
You could pass the original directory in some environment variable or
whatever, but I think that's just not worth the trouble for Cogito -
it's much easier for it when you just stay in the directory you are in
and instead set the environment variables so that the git toolkit DTRT.
(I like this acronym. :-)

> BTW, I am wondering if your choice of cg-commit as an example
> (as opposed to something else like diff or add) is a flamebait
> or just an innocent random example ;-)?

It was completely innocent. :-) How would it be a flamebait?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* [PATCH] fix show_date() for positive timezones
From: Nicolas Pitre @ 2005-05-18 21:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git


Signed-off-by: Nicolas Pitre <nico@cam.org>

--- a/date.c
+++ b/date.c
@@ -51,9 +51,9 @@ const char *show_date(unsigned long time
 	int minutes;
 
 	minutes = tz < 0 ? -tz : tz;
-	minutes = (tz / 100)*60 + (tz % 100);
+	minutes = (minutes / 100)*60 + (minutes % 100);
 	minutes = tz < 0 ? -minutes : minutes;
-	t = time - minutes * 60;
+	t = time + minutes * 60;
 	tm = gmtime(&t);
 	if (!tm)
 		return NULL;

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 20:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <7vll6cnup4.fsf@assigned-by-dhcp.cox.net>



On Wed, 18 May 2005, Junio C Hamano wrote:
> 
> I suspect doing something like this might be saner instead,
> assuming non raw-diffs come at the end.  

It won't ever trigger, since we only exit the loop once we see EOF.

So the non-diffs at the end will trigger by the exact same "oh, we didn't 
recognize it" logic.

		Linus

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Junio C Hamano @ 2005-05-18 20:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, pasky
In-Reply-To: <Pine.LNX.4.58.0505181134470.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I took the liberty of doing just that. The only subtle issue was that 
LT> the strbuf functions would consider an empty line to be EOF, which looked 
LT> wrong and unintentional. Fixing that made the actual diff-helper changes 
LT> totally trivial, and I can now do
LT> 	git-rev-list HEAD | git-diff-tree -r -v --stdin | ./git-diff-helper -r | less -S
LT> and it does the right thing for me.

Thanks for fixing up strbuf.

@@ -136,8 +268,12 @@ int main(int ac, const char **av) {
 		if (sb.eof)
 			break;
 		status = parse_diff_raw_output(sb.buf, av+1, ac-1, reverse);
-		if (status)
-			fprintf(stderr, "cannot parse %s\n", sb.buf);
+		if (status) {
+			flush_renames(av+1, ac-1, reverse);
+			printf("%s%c", sb.buf, line_termination);
+		}
 	}
+
+	flush_renames(av+1, ac-1, reverse);
 	return 0;
 }

I suspect doing something like this might be saner instead,
assuming non raw-diffs come at the end.  

		if (status)
			break;
	}
	flush_renames(av+1, ac-1, reverse);
	if (!sb.eof) {
        	spit out what we have in sb.eof, sendfile ;-) the
                rest of the input to the output.
	}
	return 0;

^ permalink raw reply

* git-diff-tree updates..
From: Linus Torvalds @ 2005-05-18 20:28 UTC (permalink / raw)
  To: Git Mailing List

I've just fixed two annoyances of mine with git-diff-tree, which sadly 
caused me to break some syntax.

In particular, diff-tree for some unfathomable reason (probably incipient 
braindamage in yours truly) used a single dash "-" to mark the end of 
command line arguments, rather than the "--" that everybody else uses.

I hope nobody depended on it, because I fixed it.

The other thing I did was to allow a single SHA1, and then consider that 
to be equivalent to a one-line "--stdin" thing. Ie you can now do

	git-diff-tree -v -p HEAD

and it will do what you'd expect it to do, ie it should be equivalent to

	cat .git/HEAD | git-diff-tree -v -p --stdin

(apart from a silly bug which I'll fix shortly).

The latter means that if you actually want to track a _file_ named HEAD
(or anything else that might trigger as a reference), you'd need to do

	git-rev-list HEAD | git-diff-tree -v -p --stdin -- HEAD

but I'm considering making the single-dash thing be equivalent to 
the combination "--stdin", and not allow SHA1 naming after it, so that 
this could be shortened to just be

	git-rev-list HEAD | git-diff-tree -v -p - HEAD

(but I wanted to make the "-" semantics change be a two-phase thing).

		Linus

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Dan Holmsand @ 2005-05-18 19:32 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.62.0505181428170.20274@localhost.localdomain>

Nicolas Pitre wrote:
> On Wed, 18 May 2005, Dan Holmsand wrote:
>>Nicolas Pitre wrote:
>>It's probably better to skip deltafication of very small files altogether. Big
>>pain for small gain, and all that.
> 
> No, that's not what I mean.
> 
> Suppose a large source file that may change only one line between two 
> versions.  The delta may therefore end up being only a few bytes long.  
> Compressing a few bytes with zlib creates a _larger_ file than the 
> original few bytes.

Oh, I see. You're right, of course. I doubt, however, that there are 
really large gains to be made, measured in bytes; since small files tend 
to be small :-) It might nevertheless be worthwhile to skip processing 
of really small files, though, as there's basically no hope of gaining 
anything by deltafication.

Anyway, I've also noticed that deltas compress a lot worse than regular 
blobs. In particular, "complex deltas" (i.e. small changes on lines 
1,3,5,9,12, etc.), compress poorly. That might explain why my simplistic 
"depth-one-deltas-against-a-common-keyframe" method works comparatively 
well, as that should tend to cleaner deltas from time to time (i.e. 
chunk on lines 1-12 got replaced by new stuff), that ought to be easier 
to compress.

>>>Well, any delta object smaller than its original object saves space, even if
>>>it's 75% of the original size. But...
>>
>>That's not true if you want to keep the delta chain length down (and thus
>>performance up).
> 
> Sure.  That's why I added the -d switch to mkdelta.  But if you can fit 
> a delta which is 75% the size of its original object size then you still 
> save 25% of the space, regardless of the delta chain length.

Ok, I guess we're talking about slightly different things here. I was 
talking about my simple-and-fast method of processing one new version of 
an object at a time, deltafying each new version against the first 
previous non-deltafied one. If you, in that scenario, allow use of up to 
75% sized deltas, on average delta size will probably be something like 37%.

> In fact it seems that deltas might be significantly harder to compress.  
> Therefore a test on the resulting file should probably be done as well 
> to make sure we don't end up with a delta larger than the original 
> object.

Yeah (see above). That's just one of the things I cheated my way out of, 
by limiting delta size to 10% (I trusting that compression doesn't 
increase size by 90%).

>>>... but then the ultimate solution is to try out all possible references
>>>within a given list.  My git-deltafy-script already finds out the list of
>>>objects belonging to the same file.  Maybe git-mkdelta should try all
>>>combinations between them.  This way a deeper delta chain could be allowed
>>>for maximum space saving.
>>
>>Yeah. But then you lose the ability to do incremental deltafication, or
>>deltafication on-the-fly.
> 
> 
> Not at all.  Nothing prevents you from making the latest revision of a 
> file be the reference object and the previous revision turned into a 
> delta against that latest revision, even if it was itself a reference 
> object before.  The only thing that must be avoided is a delta loop and 
> current mkdelta code takes care of that already.

Sure. But there's some downside to modifying already existing objects. 
In particular, downloads using the existing methods (both http and 
rsync) won't get your new, smaller objects. If someone is pulling from a 
deltafied repository often enough, and the objects of the top-most 
commit are always non-deltafied, they will never see any deltas. Object 
immutability is a really good thing.

And there might be some performance issues involved, if you'd like to do 
deltafying at commit-time (not that I've actually tried this, or 
anything)...

Thanks for your comments!

/dan

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Thomas Glanzmann @ 2005-05-18 18:52 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.58.0505181134470.18337@ppc970.osdl.org>

Hello,

> 	git-rev-list HEAD | git-diff-tree -r -v --stdin | ./git-diff-helper -r | less -S

nice one!

	Thomas

^ permalink raw reply

* 'git resolve'
From: Thomas Glanzmann @ 2005-05-18 18:50 UTC (permalink / raw)
  To: GIT

Hello,
has someone out there an git-resolve script? Which guides the user tot
he process of stuff that is not handeled the autmatic merge or the
threeway merge?

	Thomas

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Nicolas Pitre @ 2005-05-18 18:41 UTC (permalink / raw)
  To: Dan Holmsand; +Cc: git
In-Reply-To: <d6evrk$jv2$1@sea.gmane.org>

On Wed, 18 May 2005, Dan Holmsand wrote:

> Nicolas Pitre wrote:
> > One thing I've been wondering about is whether gzipping small deltas is
> > actually a gain.  For very small files it seems that gzip is adding more
> > overhead making the compressed file actually larger.  Might be worth storing
> > some deltas uncompressed if the compressed version turns out to be larger.
> 
> It's probably better to skip deltafication of very small files altogether. Big
> pain for small gain, and all that.

No, that's not what I mean.

Suppose a large source file that may change only one line between two 
versions.  The delta may therefore end up being only a few bytes long.  
Compressing a few bytes with zlib creates a _larger_ file than the 
original few bytes.

> > Well, any delta object smaller than its original object saves space, even if
> > it's 75% of the original size. But...
> 
> That's not true if you want to keep the delta chain length down (and thus
> performance up).

Sure.  That's why I added the -d switch to mkdelta.  But if you can fit 
a delta which is 75% the size of its original object size then you still 
save 25% of the space, regardless of the delta chain length.

> But in this case, the trick is to know when to stop deltafying against one
> base file, and start over with another. If you switch to a new keyframe too
> often, you obviously lose some potential savings. But if you don't switch
> often enough, you end up repeating the same data in too many delta files.

That's why multiple combinations should be tried.  And to keep things 
under control then a new argument specifying the delta "distance" might 
limit the number of trials.

> A maximum delta size of 10% turned out to be ideal for at least the "fs"
> tree. 8% was significantly worse, as was 15%. (The ideal size depends on  how
> big the average change is: the smaller the average change, the smaller the max
> delta size should be).

In fact it seems that deltas might be significantly harder to compress.  
Therefore a test on the resulting file should probably be done as well 
to make sure we don't end up with a delta larger than the original 
object.

> > ... but then the ultimate solution is to try out all possible references
> > within a given list.  My git-deltafy-script already finds out the list of
> > objects belonging to the same file.  Maybe git-mkdelta should try all
> > combinations between them.  This way a deeper delta chain could be allowed
> > for maximum space saving.
> 
> Yeah. But then you lose the ability to do incremental deltafication, or
> deltafication on-the-fly.

Not at all.  Nothing prevents you from making the latest revision of a 
file be the reference object and the previous revision turned into a 
delta against that latest revision, even if it was itself a reference 
object before.  The only thing that must be avoided is a delta loop and 
current mkdelta code takes care of that already.

Nicolas

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 18:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <Pine.LNX.4.58.0505181110480.18337@ppc970.osdl.org>

On Wed, 18 May 2005, Linus Torvalds wrote:
> 
> If diff-helper just passes the lines it doesn't understand through
> unmodified (_after_ having handled any pending rename logic), it will 
> automatically do the right thing.

I took the liberty of doing just that. The only subtle issue was that 
the strbuf functions would consider an empty line to be EOF, which looked 
wrong and unintentional. Fixing that made the actual diff-helper changes 
totally trivial, and I can now do

	git-rev-list HEAD | git-diff-tree -r -v --stdin | ./git-diff-helper -r | less -S

and it does the right thing for me.

		Linus

^ permalink raw reply

* Re: Core and Not-So Core
From: Juliusz Chroboczek @ 2005-05-18 18:35 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <2cfc40320505100800426d38ca@mail.gmail.com>

> To give a concrete example: the cache currently contains most of the
> posix stat structure primarily to allow quick change detection. In the
> Java world, most of the posix stat structure is not directly
> accessible via the pure-Java file system abstractions. However, for
> most purposes detecting changes to files modification time and file
> size would be enough.

I've got exactly this problem in Darcs-git; and I ignore all of the
cached data except the file size, mtime and sha1.  I don't currently
ever write to the cache.

> I think it would be worthwhile if care was taken to draw a distinction
> between the repository and the cache aspects of the git core, perhaps
> even going to the extreme of moving all knowledge of the  cache into
> cogito itself.

There's nothing that prevents you from ignoring the Git cache and
using your own cache instead.

                                        Juliusz


^ permalink raw reply

* Re: Cogito updates?
From: Petr Baudis @ 2005-05-18 18:26 UTC (permalink / raw)
  To: Zack Brown; +Cc: git
In-Reply-To: <20050518145325.GG7391@tumblerings.org>

Dear diary, on Wed, May 18, 2005 at 04:53:25PM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> Hi Petr,

Hello,

> I'm tracking rsync://rsync.kernel.org/pub/scm/cogito/cogito.git
> 
> I see no updates since May 15, but tons before that. Is my repo broken? From the
> list traffic, it seems there are a lot of patches going in.

those patches are mostly going to git-pb. Cogito will get its turn on
Friday afternoon.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 18:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <7v64xgpgb0.fsf@assigned-by-dhcp.cox.net>



On Wed, 18 May 2005, Junio C Hamano wrote:
> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> However, git-diff-helper doesn't understand these things, and the builtin
> LT> diff doesn't do the rename thing. Yet it would be very very useful to do.
> 
> It is unclear what you meant by "these things" in "doesn't
> understand these things", and what you meant by "it" in "it
> would be very very useful to do."  Could you explain?

"These things" being the extra output from "diff-tree" that is not a "diff 
line".

If diff-helper just passes the lines it doesn't understand through
unmodified (_after_ having handled any pending rename logic), it will 
automatically do the right thing.

> About the built-in diff not doing the rename , I have a bit
> longer term (knowing _my_ timescale I'd imagine you would
> understand that is not that long ;-) plan to have -p option for
> diff-* family to use the same rename detection logic that I
> added to diff-helper in the patch you are commenting on.

Goodie. I was hoping that was the case, but felt that the diff-helper 
thing should be pretty easy to do.

			Linus

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Junio C Hamano @ 2005-05-18 17:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, pasky
In-Reply-To: <Pine.LNX.4.58.0505180821470.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> However, git-diff-helper doesn't understand these things, and the builtin
LT> diff doesn't do the rename thing. Yet it would be very very useful to do.

It is unclear what you meant by "these things" in "doesn't
understand these things", and what you meant by "it" in "it
would be very very useful to do."  Could you explain?

About the built-in diff not doing the rename , I have a bit
longer term (knowing _my_ timescale I'd imagine you would
understand that is not that long ;-) plan to have -p option for
diff-* family to use the same rename detection logic that I
added to diff-helper in the patch you are commenting on.  It
involves slight change to callers (three diff-* family main
programs) to add a call to tell the diff driver "I've given you
all the diffs, now go look for renames") at the end, and the
rest is changes to what diff.c does internally.

LT> So what I'd suggest is one (or both) of two possibilities:
LT>  - make the internal diff logic also able to do the same rename handling 
LT>    as the external diff-helper. This may or may not be complex, I've not 
LT>    looked at it.

Yes that is part of the plan.  I wanted to do things in these
steps:

  - Put rename detect in helper so screwups there would not
    impact the diff-* family's built-in output, with the initial
    dumb rename detection.

  - Improve rename detection still keeping the logic and
    machinery in diff-helper only.  I expect a heuristics
    similar to the one you posted on the deltification thread
    would work nicely here as well.

  - Straighten out the GIT_EXTERNAL_DIFF interface so that it
    can also express renames (the patch I sent currently punts
    there).  They will get eighth argument (rename destination)
    only when they are being fed a rename patch.
    git-apply-patch-script needs to be adjusted for this change.

  - Change diff-helper not to do the rename detection itself,
    but clean it up so it uses the same diff_addremove(),
    diff_change(), and diff_unmerge() interface the diff-*
    family use.  Change the implementation of these three
    functions so that they do not directly call
    run_external_diff() but pool changes for later matching when
    rename detection is in effect.  Add diff_finished() which
    would flush the rename candidate pools, and call that at the
    end of program from three diff-* family and diff-helper.
    The rename detection logic in diff-helper will be moved to
    this "inspect the pooled rename candidates, match them up
    and flush" part.

The patch I sent is the first step in the above sequence.

LT>  - change diff-helper subtly: instead of printing "cannot parse %s", any
LT>    nonrecognized line would be a "ignore this line, but process all
LT>    pending potential renames".

Once the built-in diff driver is straightened out the way I
outlined above, this change may turn out to be unnecessary, I
need to look at the whatchanged output and think a bit more
about this later today.

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Dan Holmsand @ 2005-05-18 17:15 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.58.0505180754060.18337@ppc970.osdl.org>

Linus Torvalds wrote:
> Has anybody tried:
> 
>  4) don't limit yourself to previous-history-objects
> 
> One of the things I liked best about the delta patches was that it is
> history-neutral, and can happily delta an object against any other random
> object, in the same tree, in a future tree, or in a past tree.
> 
> Even without any history at all, there should be a noticeable amount of 
> delta opportunities, as different architectures often end up sharing files 
> that are quite similar, but not exactly the same.
> 
> Now, that's a very expensive thing to do, since it changes the question of
> "which object should I delta against" from O(1) to O(n) (where "n" is tyhe
> total number of objects), and thus the whole deltafication from O(n) to
> O(n**2), but especially together with your "max 10%" rule, you should be
> able to limit your choices very effectively: if you know that your delta
> should be within 10% of the total size, you can limit your "let's try that
> object" search to other objects that are also within 10% of your object.

Yeah, that sounds very interresting *and* very expensive...

Ideally, I'd like to find the set of objects that should *not* be 
deltafied (i.e. the ideal "keyframe" objects), but that would generate 
the maximum number of small, depth-one deltas with the least total size. 
But I can't really see how that could be done in a number of 
deltafications significantly less than the number of atoms in the 
universe. Let me think about that some more, though.

I'd like to try a couple of other approaches anyway:

a) Sort all objects by size. Start by biggest (or smallest), and try to 
get as many max-10%-deltas out of that as possible, stopping the search 
when objects get too small (big) according to some size difference 
limit. Cross already deltafied objects off the list, and continue. Might 
work, and might be fast enough with a sufficiently small size limit.

b) Use the same history-based approach as before, and in addition try to 
deltafy any "new" objects against other new objects and previous ones 
(say one or two commits back) in a given size range. That should catch 
renames, copys of the same template, etc. That shouldn't really affect 
performance, as new files are added comparatively seldom.

> Doing this experiment at least once should be interesting. It may turn out
> that the incremental space savings aren't all that noticeable, and that
> the pure history-based one already finds 90% of all savings, making the
> expensive version not worth it. It would be nice to _know_, though.

I definitely agree. And I also agree that the history-based 
deltafication seems less than pure, from a "git, the object store that 
doesn't really care" point of view.

On the other hand, the history-based thing has its advantages. It takes 
advantage of people's hard work to make patches as small as possible. 
It's fast. And (perhaps more importantly), it's deterministic. The 
"ideal" approach could possibly require every single blob to be 
redeltafied when a new object is added, if we want to stay ideal.

And it could be done at commit-time, thus keeping git's nice promise of 
immutable files, while still keeping size requirements down. And as my 
current method gives roughly an 80% size reduction over "plain git", 
that might (by boring, excessively practical people) be considered 
enough :-)

/dan

^ permalink raw reply

* [PATCH] compile fixes for solaris: include limits.h one more time
From: Thomas Glanzmann @ 2005-05-18 16:46 UTC (permalink / raw)
  To: GIT

[-- Attachment #1: Type: text/plain, Size: 131 bytes --]

[PATCH] compile fixes for solaris: include limits.h one more time

Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 204 bytes --]

--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -34,6 +34,7 @@
  */
 #include <sys/types.h>
 #include <dirent.h>
+#include <limits.h>
 #include "cache.h"
 
 static int force = 0, quiet = 0, not_new = 0;

^ permalink raw reply

* Build change for Darcs-git
From: Juliusz Chroboczek @ 2005-05-18 16:46 UTC (permalink / raw)
  To: darcs-devel, Git Mailing List, darcs-devel

Note to anyone using Darcs-git: from now on you need to
``configure --enable-git'' in order to get Git support.

                                        Juliusz

^ permalink raw reply

* [PATCH] Fix diff output take #4.
From: Junio C Hamano @ 2005-05-18 16:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matthias Urlichs, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505180819190.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Yes, that makes sense. It's not three flags "g" "i" and "t", it's the 
LT> "git" flag.

Concurred.  This is against the tip of your tree.  Pasky already
has a version with '-git' in his tree but I trust he can deal
with that single byte change locally.

------------
[PATCH] Fix diff output take #4.

This implements the output format suggested by Linus in
<Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>, except the
imaginary diff option is spelled "diff --git" with double
dashes.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff.c                 |   14 +++++++-------
t/t4000-diff-format.sh |    7 +++++--
2 files changed, 12 insertions(+), 9 deletions(-)

diff -git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -83,7 +83,6 @@
 			 struct diff_tempfile *temp)
 {
 	int i, next_at;
-	const char *git_prefix = "# mode: ";
 	const char *diff_cmd = "diff -L'%s%s' -L'%s%s'";
 	const char *diff_arg  = "'%s' '%s'||:"; /* "||:" is to return 0 */
 	const char *input_name_sq[2];
@@ -123,15 +122,16 @@
 	next_at += snprintf(cmd+next_at, cmd_size-next_at,
 			    diff_arg, input_name_sq[0], input_name_sq[1]);
 
+	printf("diff --git a/%s b/%s\n", name, name);
 	if (!path1[0][0])
-		printf("%s. %s %s\n", git_prefix, temp[1].mode, name);
+		printf("new file mode %s\n", temp[1].mode);
 	else if (!path1[1][0])
-		printf("%s%s . %s\n", git_prefix, temp[0].mode, name);
+		printf("deleted file mode %s\n", temp[0].mode);
 	else {
-		if (strcmp(temp[0].mode, temp[1].mode))
-			printf("%s%s %s %s\n", git_prefix,
-			       temp[0].mode, temp[1].mode, name);
-
+		if (strcmp(temp[0].mode, temp[1].mode)) {
+			printf("old mode %s\n", temp[0].mode);
+			printf("new mode %s\n", temp[1].mode);
+		}
 		if (strncmp(temp[0].mode, temp[1].mode, 3))
 			/* we do not run diff between different kind
 			 * of objects.
diff -git a/t/t4000-diff-format.sh b/t/t4000-diff-format.sh
--- a/t/t4000-diff-format.sh
+++ b/t/t4000-diff-format.sh
@@ -26,7 +26,9 @@
     'git-diff-files -p after editing work tree.' \
     'git-diff-files -p >current'
 cat >expected <<\EOF
-# mode: 100644 100755 path0
+diff --git a/path0 b/path0
+old mode 100644
+new mode 100755
 --- a/path0
 +++ b/path0
 @@ -1,3 +1,3 @@
@@ -34,7 +36,8 @@
  Line 2
 -line 3
 +Line 3
-# mode: 100755 . path1
+diff --git a/path1 b/path1
+deleted file mode 100755
 --- a/path1
 +++ /dev/null
 @@ -1,3 +0,0 @@
------------------------------------------------



^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Matthias Urlichs @ 2005-05-18 16:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <Pine.LNX.4.58.0505180819190.18337@ppc970.osdl.org>

Hi,

Linus Torvalds:
>  It's not three flags "g" "i" and "t",

Or the (hitherto nonexisting, at least in GNU diff) '-g' flag with an
"it" argument. ;-)

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 15:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <7v3bslqc94.fsf@assigned-by-dhcp.cox.net>

On Tue, 17 May 2005, Junio C Hamano wrote:
>
> This is just a cover letter but the next patch implements the
> rename detection I told you about.

Ok. I now have one more worry: git-diff-tree with "--stdin".

I actually find that thing to be surpremely useful, and I use it for not 
just my silly git-whatchanged script, but also to make release notes. I 
just do

	git-rev-tree HEAD ^LAST_RELEASE | cut -d' ' -f2 | git-diff-tree -v -s --stdin

and that's wonderful. And sometimes I use "-p" instead of "-s", because 
I'm not making release notes, but because I'm doing a "whatchanged" with 
full diffs.

In other words, try something like this on the kernel tree:

	git-whatchanged -p include/asm-i386 arch/i386 | less -S

and stare in wonder at just how _useful_ this simple thing is.

Maybe it's just that I've not used all that many different SCM systems,
and I've certainly not used all the features of the ones I _have_ used, so
maybe I never knew, but dammit, I've never seen anybody else do something
quite that useful (at least doing it fast enough that it _remains_
useful).

(Replace "arch/i386" with "drivers/usb" or "fs/ext3" or whatever,
depending on just what your area of interest happens to be).

In other words, I'm very happy with git.

However, git-diff-helper doesn't understand these things, and the builtin
diff doesn't do the rename thing. Yet it would be very very useful to do.

Now, if you do just a

	git-whatchanged include/asm-i386 arch/i386

(or you can even use "-z" if you want to), it turns out hat git-diff-tree 
actually does output perfectly usable material. Each "set of diffs" is 
clearly separated, and there is no ambiguos material: all lines are 
either:
 - empty or start with a whitespace
 - start with "diff-tree ", "Author: " or "Date: "
 - are valid input for diff-helper

So what I'd suggest is one (or both) of two possibilities:
 - make the internal diff logic also able to do the same rename handling 
   as the external diff-helper. This may or may not be complex, I've not 
   looked at it.
 - change diff-helper subtly: instead of printing "cannot parse %s", any
   nonrecognized line would be a "ignore this line, but process all
   pending potential renames".

The above would mean that I could either just do

	git-whatchanged include/asm-i386 arch/i386 | git-diff-helper

or continue to use

	git-whatchanged -p include/asm-i386 arch/i386

and still get the "nice" output (and it's a _feature_ that a rename within
the arch/i386 then shows up as a rename, but a rename that crosses the
boundary shows up as a "create" or "delete" event).

Comments? 

			Linus

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Linus Torvalds @ 2005-05-18 15:12 UTC (permalink / raw)
  To: Dan Holmsand; +Cc: git
In-Reply-To: <d6dohe$dql$1@sea.gmane.org>

On Tue, 17 May 2005, Dan Holmsand wrote:
> 
> Therefore, I tried some other approaches. This one seemed to work
> best:
> 
> 1) I limit the maximum size of any delta to 10% of the size of the new
> version. That guarantees a big saving, as long as any delta is
> produced.
> 
> 2) If the "previous" version of a blob is a delta, I produce the new
> delta form the old deltas base version. This works surprisingly well.
> I'm guessing the reason for this is that most changes are really
> small, and they tend to be in the same area as a previous change (as
> in "Commit new feature. Commit bugfix for new feature. Commit fix for
> bugfix of new feature. Delete new feature as it doesn't work...").
> 
> 3) I use the same method for all tree objects.

Has anybody tried:

 4) don't limit yourself to previous-history-objects

One of the things I liked best about the delta patches was that it is
history-neutral, and can happily delta an object against any other random
object, in the same tree, in a future tree, or in a past tree.

Even without any history at all, there should be a noticeable amount of 
delta opportunities, as different architectures often end up sharing files 
that are quite similar, but not exactly the same.

Now, that's a very expensive thing to do, since it changes the question of
"which object should I delta against" from O(1) to O(n) (where "n" is tyhe
total number of objects), and thus the whole deltafication from O(n) to
O(n**2), but especially together with your "max 10%" rule, you should be
able to limit your choices very effectively: if you know that your delta
should be within 10% of the total size, you can limit your "let's try that
object" search to other objects that are also within 10% of your object.

That doesn't change the basic expense factor much in theory (if sizes were
truly evenly distributed in <n> it might change it, but there's probably
only a few different "classes" of file sizes, much fewer than <n>, so it's
still probably O(n**2)), but it should cut down the work by some
noticeable constant factor, making it a hopefully realistic experiment.

So your first rule makes a global deltafication cheaper, and in fact,
together with your second rule, you might even decide to make the size 
differential depend on the size of the _compressed_ object, since you 
don't care about objects that have already been deltafied, and if they are 
within 10% of each other, then they should also likely compress similarly, 
and it should thus be pretty equivalent to just compare compressed sizes.

Again, that second optimization wouldn't change the O(n**2) nature of the
expense, but should give another nice factor of speedup, maybe making the
exercise possible in the first place.

As to the long-term "O(n**2) deltafication is not practical for big 
projects with lots of history" issue, doing things incrementally should 
hopefully solve that, and turn it into a series of O(n) operations at the 
cost of saying "we'll never re-delta an object against the future once 
we've found a delta in the past or used it as a base for a delta".

The fsck "scan all objects" code could be a good starting point.

Doing this experiment at least once should be interesting. It may turn out
that the incremental space savings aren't all that noticeable, and that
the pure history-based one already finds 90% of all savings, making the
expensive version not worth it. It would be nice to _know_, though.

		Linus

^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Linus Torvalds @ 2005-05-18 15:20 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <pan.2005.05.18.13.40.32.907488@smurf.noris.de>

On Wed, 18 May 2005, Matthias Urlichs wrote:

> Hi, Junio C Hamano wrote:
> > 
> > I'd agree what you said about "diff -git" in the rest of your
> > message makes the most sense.
> > 
> ... except, please use "diff --git".

Yes, that makes sense. It's not three flags "g" "i" and "t", it's the 
"git" flag.

		Linus

^ permalink raw reply

* Cogito updates?
From: Zack Brown @ 2005-05-18 14:53 UTC (permalink / raw)
  To: pasky; +Cc: git

Hi Petr,

I'm tracking rsync://rsync.kernel.org/pub/scm/cogito/cogito.git

I see no updates since May 15, but tons before that. Is my repo broken? From the
list traffic, it seems there are a lot of patches going in.

Be well,
Zack

-- 
Zack Brown

^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Matthias Urlichs @ 2005-05-18 13:40 UTC (permalink / raw)
  To: git
In-Reply-To: <7vsm0mlosf.fsf@assigned-by-dhcp.cox.net>

Hi, Junio C Hamano wrote:
> 
> I'd agree what you said about "diff -git" in the rest of your
> message makes the most sense.
> 
... except, please use "diff --git".

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de

^ permalink raw reply

* [PATCH] Kill a bunch of pointer sign warnings for gcc4
From: Brian Gerst @ 2005-05-18 12:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

- Raw hashes should be unsigned char.
- String functions want signed char.
- Hash and compress functions want unsigned char.

Signed-off By: Brian Gerst <bgerst@didntduck.org>
----------------------------

PS.
tar-tree.c: In function ‘main’:
tar-tree.c:437: warning: pointer targets in passing argument 1 of 
‘write_header’ differ in signedness

The "0" looks bogus, since it should be a raw hash not a text string.


[-- Attachment #2: ptrsign.diff --]
[-- Type: text/x-patch, Size: 8063 bytes --]

Kill a bunch of pointer sign warnings for gcc4.

---
commit 9d6d4056081ea693b9d0b28a1507921328df0b26
tree 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3
parent 02481aec2a2cfce7bc47d0d10876be5507f0b7ba
author <bgerst@citadel.(none)> Wed, 18 May 2005 07:59:28 -0400
committer <bgerst@citadel.(none)> Wed, 18 May 2005 07:59:28 -0400

 cache.h      |    4 ++--
 diff-cache.c |    2 +-
 diff-files.c |    4 ++--
 http-pull.c  |    4 ++--
 ls-tree.c    |    2 +-
 read-cache.c |    2 +-
 rpush.c      |    2 +-
 sha1_file.c  |   18 +++++++++---------
 strbuf.h     |    2 +-
 tar-tree.c   |    6 +++---
 10 files changed, 23 insertions(+), 23 deletions(-)

Index: cache.h
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/cache.h  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/cache.h  (mode:100644)
@@ -143,7 +143,7 @@
 extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size);
 extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
-extern int write_sha1_file(char *buf, unsigned long len, const char *type, unsigned char *return_sha1);
+extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
 
 extern int check_sha1_signature(unsigned char *sha1, void *buf, unsigned long size, const char *type);
 
@@ -167,7 +167,7 @@
 extern int cache_name_compare(const char *name1, int len1, const char *name2, int len2);
 
 extern void *read_object_with_reference(const unsigned char *sha1,
-					const unsigned char *required_type,
+					const char *required_type,
 					unsigned long *size,
 					unsigned char *sha1_ret);
 
Index: diff-cache.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/diff-cache.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/diff-cache.c  (mode:100644)
@@ -63,7 +63,7 @@
 {
 	unsigned int mode, oldmode;
 	unsigned char *sha1;
-	unsigned char old_sha1_hex[60];
+	char old_sha1_hex[60];
 
 	if (get_stat_data(new, &sha1, &mode) < 0) {
 		if (report_missing)
Index: diff-files.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/diff-files.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/diff-files.c  (mode:100644)
@@ -48,7 +48,7 @@
 }
 
 static void show_modified(int oldmode, int mode,
-			  const char *old_sha1, const char *sha1,
+			  const unsigned char *old_sha1, const unsigned char *sha1,
 			  char *path)
 {
 	char old_sha1_hex[41];
@@ -64,7 +64,7 @@
 
 int main(int argc, char **argv)
 {
-	static const char null_sha1[20] = { 0, };
+	static const unsigned char null_sha1[20] = { 0, };
 	int entries = read_cache();
 	int i;
 
Index: http-pull.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/http-pull.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/http-pull.c  (mode:100644)
@@ -24,7 +24,7 @@
 
 static size_t fwrite_sha1_file(void *ptr, size_t eltsize, size_t nmemb, 
 			       void *data) {
-	char expn[4096];
+	unsigned char expn[4096];
 	size_t size = eltsize * nmemb;
 	int posn = 0;
 	do {
@@ -49,7 +49,7 @@
 {
 	char *hex = sha1_to_hex(sha1);
 	char *filename = sha1_file_name(sha1);
-	char real_sha1[20];
+	unsigned char real_sha1[20];
 	char *url;
 	char *posn;
 
Index: ls-tree.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/ls-tree.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/ls-tree.c  (mode:100644)
@@ -24,7 +24,7 @@
 }
 
 static void list_recursive(void *buffer,
-			   const unsigned char *type,
+			   const char *type,
 			   unsigned long size,
 			   struct path_prefix *prefix)
 {
Index: read-cache.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/read-cache.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/read-cache.c  (mode:100644)
@@ -344,7 +344,7 @@
 }
 
 #define WRITE_BUFFER_SIZE 8192
-static char write_buffer[WRITE_BUFFER_SIZE];
+static unsigned char write_buffer[WRITE_BUFFER_SIZE];
 static unsigned long write_buffer_len;
 
 static int ce_write(SHA_CTX *context, int fd, void *data, unsigned int len)
Index: rpush.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/rpush.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/rpush.c  (mode:100644)
@@ -6,7 +6,7 @@
 void service(int fd_in, int fd_out) {
 	ssize_t size;
 	int posn;
-	char sha1[20];
+	char unsigned sha1[20];
 	unsigned long objsize;
 	void *buf;
 	do {
Index: sha1_file.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/sha1_file.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/sha1_file.c  (mode:100644)
@@ -313,13 +313,13 @@
 	int ret, bytes;
 	z_stream stream;
 	char buffer[8192];
-	char *buf;
+	unsigned char *buf;
 
 	/* Get the data stream */
 	memset(&stream, 0, sizeof(stream));
 	stream.next_in = map;
 	stream.avail_in = mapsize;
-	stream.next_out = buffer;
+	stream.next_out = (unsigned char *)buffer;
 	stream.avail_out = sizeof(buffer);
 
 	inflateInit(&stream);
@@ -359,7 +359,7 @@
 }
 
 void *read_object_with_reference(const unsigned char *sha1,
-				 const unsigned char *required_type,
+				 const char *required_type,
 				 unsigned long *size,
 				 unsigned char *actual_sha1_return)
 {
@@ -403,20 +403,20 @@
 	}
 }
 
-int write_sha1_file(char *buf, unsigned long len, const char *type, unsigned char *returnsha1)
+int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
 {
 	int size;
-	char *compressed;
+	unsigned char *compressed;
 	z_stream stream;
 	unsigned char sha1[20];
 	SHA_CTX c;
 	char *filename;
 	static char tmpfile[PATH_MAX];
-	char hdr[50];
+	unsigned char hdr[50];
 	int fd, hdrlen, ret;
 
 	/* Generate the header */
-	hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
+	hdrlen = sprintf((char *)hdr, "%s %lu", type, len)+1;
 
 	/* Sha1.. */
 	SHA1_Init(&c);
@@ -516,8 +516,8 @@
 	int local;
 	z_stream stream;
 	unsigned char real_sha1[20];
-	char buf[4096];
-	char discard[4096];
+	unsigned char buf[4096];
+	unsigned char discard[4096];
 	int ret;
 	SHA_CTX c;
 
Index: strbuf.h
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/strbuf.h  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/strbuf.h  (mode:100644)
@@ -4,7 +4,7 @@
 	int alloc;
 	int len;
 	int eof;
-	unsigned char *buf;
+	char *buf;
 };
 
 extern void strbuf_init(struct strbuf *);
Index: tar-tree.c
===================================================================
--- 2b3e8f627f4b8338e1479f6011052d2f6c0e2468/tar-tree.c  (mode:100644)
+++ 6da6a42bd7b97ea6ebd79544f4fb16713ac74dc3/tar-tree.c  (mode:100644)
@@ -205,7 +205,7 @@
 	append_char(p, '\n');
 }
 
-static void write_header(const char *, char, const char *, struct path_prefix *,
+static void write_header(const unsigned char *, char, const char *, struct path_prefix *,
                          const char *, unsigned int, void *, unsigned long);
 
 /* stores a pax extended header directly in the block buffer */
@@ -238,7 +238,7 @@
 	free(buffer);
 }
 
-static void write_global_extended_header(const char *sha1)
+static void write_global_extended_header(const unsigned char *sha1)
 {
 	char *p;
 	unsigned int size;
@@ -253,7 +253,7 @@
 }
 
 /* stores a ustar header directly in the block buffer */
-static void write_header(const char *sha1, char typeflag, const char *basepath,
+static void write_header(const unsigned char *sha1, char typeflag, const char *basepath,
                          struct path_prefix *prefix, const char *path,
                          unsigned int mode, void *buffer, unsigned long size)
 {


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox