Git development
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-18 22:41 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, torvalds
In-Reply-To: <20050518213309.GD10358@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

>> $ pwd
>> /usr/src/linux/fs
>> $ git-run-with-user-path cg-commit -- ext?/Makefile
>> 
>> would work.

PB> Yes. But if you do just cg-commit in the subdirectory, it won't work.

The point of git-run-with-user-path is that it canonicalizes and
filters the paths, chdir(2)'s to GIT_PROJECT_TOP before running
cg-commit.  So when cg-commit starts in the above example,

    (1) its $cwd is /usr/src/linux and your .git subdirectory is
        right there in ./.git/
    (2) it gets fs/ext2/Makefile and fs/ext3/Makefile as arguments.

>> BTW, I am wondering if your choice of cg-commit as an example
>> (as opposed to something else like diff or add) is a flamebait
>> or just an innocent random example ;-)?

PB> It was completely innocent. :-) How would it be a flamebait?

<http://members.cox.net/junkio/per-file-commit.txt> ;-).


^ permalink raw reply

* Re: [PATCH cogito] "cg-whatsnew" command
From: Petr Baudis @ 2005-05-18 22:30 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Matthias Urlichs, git
In-Reply-To: <tnxis1jk1sn.fsf@arm.com>

Dear diary, on Mon, May 16, 2005 at 10:33:44AM CEST, I got a letter
where Catalin Marinas <catalin.marinas@arm.com> told me that...
> Matthias Urlichs <smurf@smurf.noris.de> wrote:
> >> +	cg-diff		[-p] [-r FROM_ID[:TO_ID]] [-m [BNAME] [BNAME]] [FILE]...
> >
> > That should be
> >
> > [-m [BNAME [BNAME]]]
> 
> You are right.
> 
> > though I'd suggest something more mnemonic than two BNAMEs.
> 
> Another try, see attached.

Unfortunately I can't comment on it well when it's not either in the
body or as text/plain attachment.

I think the -m usage doesn't make much sense now. What about dropping
branch1 and instead using what the user passed as the -r argument?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: README rewrite
From: Petr Baudis @ 2005-05-18 22:27 UTC (permalink / raw)
  To: Zack Brown; +Cc: git
In-Reply-To: <20050515044941.GB7391@tumblerings.org>

Dear diary, on Sun, May 15, 2005 at 06:49:41AM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> Here's an updated patch with fixes, apply instead of the one I just sent:
> 
> Signed-off-by: Zack Brown <zbrown@tumblerings.org>

Thanks. I've used the first part of the rewrite, tweaked it
substantially (I have some reservations about the style and suspicions
regarding the grammar), and somewhat awkwardly merged in the current
stuff missing in the rewrite (cg-diff and such).

I'd prefer the reference documentation in separate Documentation/ files,
much in the style of the GIT documentation.

Thanks again,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: README rewrite
From: Petr Baudis @ 2005-05-18 21:42 UTC (permalink / raw)
  To: Zack Brown; +Cc: Wink Saville, git
In-Reply-To: <20050516151604.GF7391@tumblerings.org>

Dear diary, on Mon, May 16, 2005 at 05:16:04PM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> So a branch is just a name for a cloned tree somewhere, the same as a tag is
> just a name for a revision some time in the past?

Very much so.

> On Sun, May 15, 2005 at 07:28:03PM +0200, Petr Baudis wrote:
> > So the local branch is the "master" branch, the rest are "remote"
> > branches. Note that there is a theoretical support for multiple local
> > branches, but I decided not to make things even more confusing and there
> > is no Cogito interface for managing them now.
> 
> Is there anything about the repository that 'knows' which is the master branch,
> or is this just a matter of which person is in charge? So, if I have a project,
> and I have a Cogito repository, so far it's just me, and just one branch.
> 
> Then another person joins the project, and they clone my repository onto their
> local system, and give it their own branch name.
> 
> Now here is the question:
> 
> We decide that the other person is a better project leader, and we decide to use
> their branch as the master branch, and mine as just a remote branch.
> 
> Would that be normal Cogito behavior? i.e. there is nothing to distinguish a
> 'master' branch from any other except that it is the one everyone says is the
> master branch?

That's right. The "master" branch is just your local thing, as well as
naming of the remote branches. The "master" branch name means this is
the branch representing your working tree, not that it is the mainline
of the project or anything. If you fork Linus' tree, in your repository
your fork will be the "master" branch, and Linus' branch will be called
however you name it.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 0/4] Pulling refs files
From: Petr Baudis @ 2005-05-18 21:35 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Linus Torvalds
In-Reply-To: <Pine.LNX.4.21.0505171802570.30848-100000@iabervon.org>

Dear diary, on Wed, May 18, 2005 at 12:20:40AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Tue, 17 May 2005, Petr Baudis wrote:
> 
> > Dear diary, on Tue, May 17, 2005 at 11:20:54PM CEST, I got a letter
> > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > > Hmm... maybe the right thing is to make the implementation-provided
> > > transfer code handle arbitrary things in GIT_DIR, but have code for
> > > updating reference files atomically and using a reference file to start
> > > from use "refs/"? Certainly, there's nothing special about reference files
> > > in transit.
> > > 
> > > Certainly the things in the info/ directory shouldn't be treated a head
> > > that you're going to pull, so that has to be different above the protocol
> > > level anyway.
> > 
> > *confused* :) I'm sorry, I have trouble understanding this. Could you
> > rephrase, please?
> 
> If you want to get info/ignore, you want to get it and save it, not
> download a set of objects it refers to. So it's different from specifying
> that you want to use refs/heads/master as the starting point for a pull.

Obviously. I think you should need to "explicitly" tell pull to actually
save any files locally, since you (I mean Cogito) certainly does not
want the pull stuff to touch the local refs/heads/master - it wants it
in some other file.

> > > So the remote receiver should get an instruction: change X from OLD to NEW
> > > and pull NEW. It should:
> > > 
> > >  - lock the file against further updates
> > >  - check that the current value is the provided OLD
> > >  - pull the necessary objects
> > >  - write NEW to the file
> > - unlock the file ;-))
> 
> The way I'm actually doing things is to write NEW into the lock file at
> some arbitrary point, and "writing to the file" is actually renaming the
> lock file to the normal filename. So writing unlocks the file
> automatically.

Ah. Obviously. That makes sense. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Petr Baudis @ 2005-05-18 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds
In-Reply-To: <7vzmutqz5f.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Wed, May 18, 2005 at 12:13:32AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> PB> But that won't work good enough for me. E.g. when committing in a
> PB> subdirectory, I want to commit only changes made in the subdirectory,
> PB> etc.
> 
> Assuming that you have something that lets you commit selected
> files when you are at the top level (say cg-commit), and further
> assuming that today it only works from the toplevel, that is:
> 
>     $ pwd
>     /usr/src/linux
>     $ cg-commit fs/ext?/Makefile
> 
> works today, what I am saying is:
> 
>     $ pwd
>     /usr/src/linux/fs
>     $ git-run-with-user-path cg-commit -- ext?/Makefile
> 
> would work.

Yes. But if you do just cg-commit in the subdirectory, it won't work.
You could pass the original directory in some environment variable or
whatever, but I think that's just not worth the trouble for Cogito -
it's much easier for it when you just stay in the directory you are in
and instead set the environment variables so that the git toolkit DTRT.
(I like this acronym. :-)

> BTW, I am wondering if your choice of cg-commit as an example
> (as opposed to something else like diff or add) is a flamebait
> or just an innocent random example ;-)?

It was completely innocent. :-) How would it be a flamebait?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* [PATCH] fix show_date() for positive timezones
From: Nicolas Pitre @ 2005-05-18 21:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git


Signed-off-by: Nicolas Pitre <nico@cam.org>

--- a/date.c
+++ b/date.c
@@ -51,9 +51,9 @@ const char *show_date(unsigned long time
 	int minutes;
 
 	minutes = tz < 0 ? -tz : tz;
-	minutes = (tz / 100)*60 + (tz % 100);
+	minutes = (minutes / 100)*60 + (minutes % 100);
 	minutes = tz < 0 ? -minutes : minutes;
-	t = time - minutes * 60;
+	t = time + minutes * 60;
 	tm = gmtime(&t);
 	if (!tm)
 		return NULL;

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 20:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <7vll6cnup4.fsf@assigned-by-dhcp.cox.net>



On Wed, 18 May 2005, Junio C Hamano wrote:
> 
> I suspect doing something like this might be saner instead,
> assuming non raw-diffs come at the end.  

It won't ever trigger, since we only exit the loop once we see EOF.

So the non-diffs at the end will trigger by the exact same "oh, we didn't 
recognize it" logic.

		Linus

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Junio C Hamano @ 2005-05-18 20:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, pasky
In-Reply-To: <Pine.LNX.4.58.0505181134470.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I took the liberty of doing just that. The only subtle issue was that 
LT> the strbuf functions would consider an empty line to be EOF, which looked 
LT> wrong and unintentional. Fixing that made the actual diff-helper changes 
LT> totally trivial, and I can now do
LT> 	git-rev-list HEAD | git-diff-tree -r -v --stdin | ./git-diff-helper -r | less -S
LT> and it does the right thing for me.

Thanks for fixing up strbuf.

@@ -136,8 +268,12 @@ int main(int ac, const char **av) {
 		if (sb.eof)
 			break;
 		status = parse_diff_raw_output(sb.buf, av+1, ac-1, reverse);
-		if (status)
-			fprintf(stderr, "cannot parse %s\n", sb.buf);
+		if (status) {
+			flush_renames(av+1, ac-1, reverse);
+			printf("%s%c", sb.buf, line_termination);
+		}
 	}
+
+	flush_renames(av+1, ac-1, reverse);
 	return 0;
 }

I suspect doing something like this might be saner instead,
assuming non raw-diffs come at the end.  

		if (status)
			break;
	}
	flush_renames(av+1, ac-1, reverse);
	if (!sb.eof) {
        	spit out what we have in sb.eof, sendfile ;-) the
                rest of the input to the output.
	}
	return 0;


^ permalink raw reply

* git-diff-tree updates..
From: Linus Torvalds @ 2005-05-18 20:28 UTC (permalink / raw)
  To: Git Mailing List


I've just fixed two annoyances of mine with git-diff-tree, which sadly 
caused me to break some syntax.

In particular, diff-tree for some unfathomable reason (probably incipient 
braindamage in yours truly) used a single dash "-" to mark the end of 
command line arguments, rather than the "--" that everybody else uses.

I hope nobody depended on it, because I fixed it.

The other thing I did was to allow a single SHA1, and then consider that 
to be equivalent to a one-line "--stdin" thing. Ie you can now do

	git-diff-tree -v -p HEAD

and it will do what you'd expect it to do, ie it should be equivalent to

	cat .git/HEAD | git-diff-tree -v -p --stdin

(apart from a silly bug which I'll fix shortly).

The latter means that if you actually want to track a _file_ named HEAD
(or anything else that might trigger as a reference), you'd need to do

	git-rev-list HEAD | git-diff-tree -v -p --stdin -- HEAD

but I'm considering making the single-dash thing be equivalent to 
the combination "--stdin", and not allow SHA1 naming after it, so that 
this could be shortened to just be

	git-rev-list HEAD | git-diff-tree -v -p - HEAD

(but I wanted to make the "-" semantics change be a two-phase thing).

		Linus

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Dan Holmsand @ 2005-05-18 19:32 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.62.0505181428170.20274@localhost.localdomain>

Nicolas Pitre wrote:
> On Wed, 18 May 2005, Dan Holmsand wrote:
>>Nicolas Pitre wrote:
>>It's probably better to skip deltafication of very small files altogether. Big
>>pain for small gain, and all that.
> 
> No, that's not what I mean.
> 
> Suppose a large source file that may change only one line between two 
> versions.  The delta may therefore end up being only a few bytes long.  
> Compressing a few bytes with zlib creates a _larger_ file than the 
> original few bytes.

Oh, I see. You're right, of course. I doubt, however, that there are 
really large gains to be made, measured in bytes; since small files tend 
to be small :-) It might nevertheless be worthwhile to skip processing 
of really small files, though, as there's basically no hope of gaining 
anything by deltafication.

Anyway, I've also noticed that deltas compress a lot worse than regular 
blobs. In particular, "complex deltas" (i.e. small changes on lines 
1,3,5,9,12, etc.), compress poorly. That might explain why my simplistic 
"depth-one-deltas-against-a-common-keyframe" method works comparatively 
well, as that should tend to cleaner deltas from time to time (i.e. 
chunk on lines 1-12 got replaced by new stuff), that ought to be easier 
to compress.

>>>Well, any delta object smaller than its original object saves space, even if
>>>it's 75% of the original size. But...
>>
>>That's not true if you want to keep the delta chain length down (and thus
>>performance up).
> 
> Sure.  That's why I added the -d switch to mkdelta.  But if you can fit 
> a delta which is 75% the size of its original object size then you still 
> save 25% of the space, regardless of the delta chain length.

Ok, I guess we're talking about slightly different things here. I was 
talking about my simple-and-fast method of processing one new version of 
an object at a time, deltafying each new version against the first 
previous non-deltafied one. If you, in that scenario, allow use of up to 
75% sized deltas, on average delta size will probably be something like 37%.

> In fact it seems that deltas might be significantly harder to compress.  
> Therefore a test on the resulting file should probably be done as well 
> to make sure we don't end up with a delta larger than the original 
> object.

Yeah (see above). That's just one of the things I cheated my way out of, 
by limiting delta size to 10% (I trusting that compression doesn't 
increase size by 90%).

>>>... but then the ultimate solution is to try out all possible references
>>>within a given list.  My git-deltafy-script already finds out the list of
>>>objects belonging to the same file.  Maybe git-mkdelta should try all
>>>combinations between them.  This way a deeper delta chain could be allowed
>>>for maximum space saving.
>>
>>Yeah. But then you lose the ability to do incremental deltafication, or
>>deltafication on-the-fly.
> 
> 
> Not at all.  Nothing prevents you from making the latest revision of a 
> file be the reference object and the previous revision turned into a 
> delta against that latest revision, even if it was itself a reference 
> object before.  The only thing that must be avoided is a delta loop and 
> current mkdelta code takes care of that already.

Sure. But there's some downside to modifying already existing objects. 
In particular, downloads using the existing methods (both http and 
rsync) won't get your new, smaller objects. If someone is pulling from a 
deltafied repository often enough, and the objects of the top-most 
commit are always non-deltafied, they will never see any deltas. Object 
immutability is a really good thing.

And there might be some performance issues involved, if you'd like to do 
deltafying at commit-time (not that I've actually tried this, or 
anything)...

Thanks for your comments!

/dan


^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Thomas Glanzmann @ 2005-05-18 18:52 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.58.0505181134470.18337@ppc970.osdl.org>

Hello,

> 	git-rev-list HEAD | git-diff-tree -r -v --stdin | ./git-diff-helper -r | less -S

nice one!

	Thomas

^ permalink raw reply

* 'git resolve'
From: Thomas Glanzmann @ 2005-05-18 18:50 UTC (permalink / raw)
  To: GIT

Hello,
has someone out there an git-resolve script? Which guides the user tot
he process of stuff that is not handeled the autmatic merge or the
threeway merge?

	Thomas

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Nicolas Pitre @ 2005-05-18 18:41 UTC (permalink / raw)
  To: Dan Holmsand; +Cc: git
In-Reply-To: <d6evrk$jv2$1@sea.gmane.org>

On Wed, 18 May 2005, Dan Holmsand wrote:

> Nicolas Pitre wrote:
> > One thing I've been wondering about is whether gzipping small deltas is
> > actually a gain.  For very small files it seems that gzip is adding more
> > overhead making the compressed file actually larger.  Might be worth storing
> > some deltas uncompressed if the compressed version turns out to be larger.
> 
> It's probably better to skip deltafication of very small files altogether. Big
> pain for small gain, and all that.

No, that's not what I mean.

Suppose a large source file that may change only one line between two 
versions.  The delta may therefore end up being only a few bytes long.  
Compressing a few bytes with zlib creates a _larger_ file than the 
original few bytes.

> > Well, any delta object smaller than its original object saves space, even if
> > it's 75% of the original size. But...
> 
> That's not true if you want to keep the delta chain length down (and thus
> performance up).

Sure.  That's why I added the -d switch to mkdelta.  But if you can fit 
a delta which is 75% the size of its original object size then you still 
save 25% of the space, regardless of the delta chain length.

> But in this case, the trick is to know when to stop deltafying against one
> base file, and start over with another. If you switch to a new keyframe too
> often, you obviously lose some potential savings. But if you don't switch
> often enough, you end up repeating the same data in too many delta files.

That's why multiple combinations should be tried.  And to keep things 
under control then a new argument specifying the delta "distance" might 
limit the number of trials.

> A maximum delta size of 10% turned out to be ideal for at least the "fs"
> tree. 8% was significantly worse, as was 15%. (The ideal size depends on  how
> big the average change is: the smaller the average change, the smaller the max
> delta size should be).

In fact it seems that deltas might be significantly harder to compress.  
Therefore a test on the resulting file should probably be done as well 
to make sure we don't end up with a delta larger than the original 
object.

> > ... but then the ultimate solution is to try out all possible references
> > within a given list.  My git-deltafy-script already finds out the list of
> > objects belonging to the same file.  Maybe git-mkdelta should try all
> > combinations between them.  This way a deeper delta chain could be allowed
> > for maximum space saving.
> 
> Yeah. But then you lose the ability to do incremental deltafication, or
> deltafication on-the-fly.

Not at all.  Nothing prevents you from making the latest revision of a 
file be the reference object and the previous revision turned into a 
delta against that latest revision, even if it was itself a reference 
object before.  The only thing that must be avoided is a delta loop and 
current mkdelta code takes care of that already.


Nicolas

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 18:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <Pine.LNX.4.58.0505181110480.18337@ppc970.osdl.org>



On Wed, 18 May 2005, Linus Torvalds wrote:
> 
> If diff-helper just passes the lines it doesn't understand through
> unmodified (_after_ having handled any pending rename logic), it will 
> automatically do the right thing.

I took the liberty of doing just that. The only subtle issue was that 
the strbuf functions would consider an empty line to be EOF, which looked 
wrong and unintentional. Fixing that made the actual diff-helper changes 
totally trivial, and I can now do

	git-rev-list HEAD | git-diff-tree -r -v --stdin | ./git-diff-helper -r | less -S

and it does the right thing for me.

		Linus

^ permalink raw reply

* Re: Core and Not-So Core
From: Juliusz Chroboczek @ 2005-05-18 18:35 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <2cfc40320505100800426d38ca@mail.gmail.com>

> To give a concrete example: the cache currently contains most of the
> posix stat structure primarily to allow quick change detection. In the
> Java world, most of the posix stat structure is not directly
> accessible via the pure-Java file system abstractions. However, for
> most purposes detecting changes to files modification time and file
> size would be enough.

I've got exactly this problem in Darcs-git; and I ignore all of the
cached data except the file size, mtime and sha1.  I don't currently
ever write to the cache.

> I think it would be worthwhile if care was taken to draw a distinction
> between the repository and the cache aspects of the git core, perhaps
> even going to the extreme of moving all knowledge of the  cache into
> cogito itself.

There's nothing that prevents you from ignoring the Git cache and
using your own cache instead.

                                        Juliusz


^ permalink raw reply

* Re: Cogito updates?
From: Petr Baudis @ 2005-05-18 18:26 UTC (permalink / raw)
  To: Zack Brown; +Cc: git
In-Reply-To: <20050518145325.GG7391@tumblerings.org>

Dear diary, on Wed, May 18, 2005 at 04:53:25PM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> Hi Petr,

Hello,

> I'm tracking rsync://rsync.kernel.org/pub/scm/cogito/cogito.git
> 
> I see no updates since May 15, but tons before that. Is my repo broken? From the
> list traffic, it seems there are a lot of patches going in.

those patches are mostly going to git-pb. Cogito will get its turn on
Friday afternoon.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 18:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <7v64xgpgb0.fsf@assigned-by-dhcp.cox.net>



On Wed, 18 May 2005, Junio C Hamano wrote:
> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> However, git-diff-helper doesn't understand these things, and the builtin
> LT> diff doesn't do the rename thing. Yet it would be very very useful to do.
> 
> It is unclear what you meant by "these things" in "doesn't
> understand these things", and what you meant by "it" in "it
> would be very very useful to do."  Could you explain?

"These things" being the extra output from "diff-tree" that is not a "diff 
line".

If diff-helper just passes the lines it doesn't understand through
unmodified (_after_ having handled any pending rename logic), it will 
automatically do the right thing.

> About the built-in diff not doing the rename , I have a bit
> longer term (knowing _my_ timescale I'd imagine you would
> understand that is not that long ;-) plan to have -p option for
> diff-* family to use the same rename detection logic that I
> added to diff-helper in the patch you are commenting on.

Goodie. I was hoping that was the case, but felt that the diff-helper 
thing should be pretty easy to do.

			Linus

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Junio C Hamano @ 2005-05-18 17:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, pasky
In-Reply-To: <Pine.LNX.4.58.0505180821470.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> However, git-diff-helper doesn't understand these things, and the builtin
LT> diff doesn't do the rename thing. Yet it would be very very useful to do.

It is unclear what you meant by "these things" in "doesn't
understand these things", and what you meant by "it" in "it
would be very very useful to do."  Could you explain?

About the built-in diff not doing the rename , I have a bit
longer term (knowing _my_ timescale I'd imagine you would
understand that is not that long ;-) plan to have -p option for
diff-* family to use the same rename detection logic that I
added to diff-helper in the patch you are commenting on.  It
involves slight change to callers (three diff-* family main
programs) to add a call to tell the diff driver "I've given you
all the diffs, now go look for renames") at the end, and the
rest is changes to what diff.c does internally.

LT> So what I'd suggest is one (or both) of two possibilities:
LT>  - make the internal diff logic also able to do the same rename handling 
LT>    as the external diff-helper. This may or may not be complex, I've not 
LT>    looked at it.

Yes that is part of the plan.  I wanted to do things in these
steps:

  - Put rename detect in helper so screwups there would not
    impact the diff-* family's built-in output, with the initial
    dumb rename detection.

  - Improve rename detection still keeping the logic and
    machinery in diff-helper only.  I expect a heuristics
    similar to the one you posted on the deltification thread
    would work nicely here as well.

  - Straighten out the GIT_EXTERNAL_DIFF interface so that it
    can also express renames (the patch I sent currently punts
    there).  They will get eighth argument (rename destination)
    only when they are being fed a rename patch.
    git-apply-patch-script needs to be adjusted for this change.

  - Change diff-helper not to do the rename detection itself,
    but clean it up so it uses the same diff_addremove(),
    diff_change(), and diff_unmerge() interface the diff-*
    family use.  Change the implementation of these three
    functions so that they do not directly call
    run_external_diff() but pool changes for later matching when
    rename detection is in effect.  Add diff_finished() which
    would flush the rename candidate pools, and call that at the
    end of program from three diff-* family and diff-helper.
    The rename detection logic in diff-helper will be moved to
    this "inspect the pooled rename candidates, match them up
    and flush" part.

The patch I sent is the first step in the above sequence.

LT>  - change diff-helper subtly: instead of printing "cannot parse %s", any
LT>    nonrecognized line would be a "ignore this line, but process all
LT>    pending potential renames".

Once the built-in diff driver is straightened out the way I
outlined above, this change may turn out to be unnecessary, I
need to look at the whatchanged output and think a bit more
about this later today.


^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Dan Holmsand @ 2005-05-18 17:15 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.58.0505180754060.18337@ppc970.osdl.org>

Linus Torvalds wrote:
> Has anybody tried:
> 
>  4) don't limit yourself to previous-history-objects
> 
> One of the things I liked best about the delta patches was that it is
> history-neutral, and can happily delta an object against any other random
> object, in the same tree, in a future tree, or in a past tree.
> 
> Even without any history at all, there should be a noticeable amount of 
> delta opportunities, as different architectures often end up sharing files 
> that are quite similar, but not exactly the same.
> 
> Now, that's a very expensive thing to do, since it changes the question of
> "which object should I delta against" from O(1) to O(n) (where "n" is tyhe
> total number of objects), and thus the whole deltafication from O(n) to
> O(n**2), but especially together with your "max 10%" rule, you should be
> able to limit your choices very effectively: if you know that your delta
> should be within 10% of the total size, you can limit your "let's try that
> object" search to other objects that are also within 10% of your object.

Yeah, that sounds very interresting *and* very expensive...

Ideally, I'd like to find the set of objects that should *not* be 
deltafied (i.e. the ideal "keyframe" objects), but that would generate 
the maximum number of small, depth-one deltas with the least total size. 
But I can't really see how that could be done in a number of 
deltafications significantly less than the number of atoms in the 
universe. Let me think about that some more, though.

I'd like to try a couple of other approaches anyway:

a) Sort all objects by size. Start by biggest (or smallest), and try to 
get as many max-10%-deltas out of that as possible, stopping the search 
when objects get too small (big) according to some size difference 
limit. Cross already deltafied objects off the list, and continue. Might 
work, and might be fast enough with a sufficiently small size limit.

b) Use the same history-based approach as before, and in addition try to 
deltafy any "new" objects against other new objects and previous ones 
(say one or two commits back) in a given size range. That should catch 
renames, copys of the same template, etc. That shouldn't really affect 
performance, as new files are added comparatively seldom.

> Doing this experiment at least once should be interesting. It may turn out
> that the incremental space savings aren't all that noticeable, and that
> the pure history-based one already finds 90% of all savings, making the
> expensive version not worth it. It would be nice to _know_, though.

I definitely agree. And I also agree that the history-based 
deltafication seems less than pure, from a "git, the object store that 
doesn't really care" point of view.

On the other hand, the history-based thing has its advantages. It takes 
advantage of people's hard work to make patches as small as possible. 
It's fast. And (perhaps more importantly), it's deterministic. The 
"ideal" approach could possibly require every single blob to be 
redeltafied when a new object is added, if we want to stay ideal.

And it could be done at commit-time, thus keeping git's nice promise of 
immutable files, while still keeping size requirements down. And as my 
current method gives roughly an 80% size reduction over "plain git", 
that might (by boring, excessively practical people) be considered 
enough :-)

/dan


^ permalink raw reply

* [PATCH] compile fixes for solaris: include limits.h one more time
From: Thomas Glanzmann @ 2005-05-18 16:46 UTC (permalink / raw)
  To: GIT

[-- Attachment #1: Type: text/plain, Size: 131 bytes --]

[PATCH] compile fixes for solaris: include limits.h one more time

Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 204 bytes --]

--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -34,6 +34,7 @@
  */
 #include <sys/types.h>
 #include <dirent.h>
+#include <limits.h>
 #include "cache.h"
 
 static int force = 0, quiet = 0, not_new = 0;

^ permalink raw reply

* Build change for Darcs-git
From: Juliusz Chroboczek @ 2005-05-18 16:46 UTC (permalink / raw)
  To: darcs-devel, Git Mailing List, darcs-devel

Note to anyone using Darcs-git: from now on you need to
``configure --enable-git'' in order to get Git support.

                                        Juliusz

^ permalink raw reply

* [PATCH] Fix diff output take #4.
From: Junio C Hamano @ 2005-05-18 16:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matthias Urlichs, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505180819190.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Yes, that makes sense. It's not three flags "g" "i" and "t", it's the 
LT> "git" flag.

Concurred.  This is against the tip of your tree.  Pasky already
has a version with '-git' in his tree but I trust he can deal
with that single byte change locally.

------------
[PATCH] Fix diff output take #4.

This implements the output format suggested by Linus in
<Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>, except the
imaginary diff option is spelled "diff --git" with double
dashes.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff.c                 |   14 +++++++-------
t/t4000-diff-format.sh |    7 +++++--
2 files changed, 12 insertions(+), 9 deletions(-)

diff -git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -83,7 +83,6 @@
 			 struct diff_tempfile *temp)
 {
 	int i, next_at;
-	const char *git_prefix = "# mode: ";
 	const char *diff_cmd = "diff -L'%s%s' -L'%s%s'";
 	const char *diff_arg  = "'%s' '%s'||:"; /* "||:" is to return 0 */
 	const char *input_name_sq[2];
@@ -123,15 +122,16 @@
 	next_at += snprintf(cmd+next_at, cmd_size-next_at,
 			    diff_arg, input_name_sq[0], input_name_sq[1]);
 
+	printf("diff --git a/%s b/%s\n", name, name);
 	if (!path1[0][0])
-		printf("%s. %s %s\n", git_prefix, temp[1].mode, name);
+		printf("new file mode %s\n", temp[1].mode);
 	else if (!path1[1][0])
-		printf("%s%s . %s\n", git_prefix, temp[0].mode, name);
+		printf("deleted file mode %s\n", temp[0].mode);
 	else {
-		if (strcmp(temp[0].mode, temp[1].mode))
-			printf("%s%s %s %s\n", git_prefix,
-			       temp[0].mode, temp[1].mode, name);
-
+		if (strcmp(temp[0].mode, temp[1].mode)) {
+			printf("old mode %s\n", temp[0].mode);
+			printf("new mode %s\n", temp[1].mode);
+		}
 		if (strncmp(temp[0].mode, temp[1].mode, 3))
 			/* we do not run diff between different kind
 			 * of objects.
diff -git a/t/t4000-diff-format.sh b/t/t4000-diff-format.sh
--- a/t/t4000-diff-format.sh
+++ b/t/t4000-diff-format.sh
@@ -26,7 +26,9 @@
     'git-diff-files -p after editing work tree.' \
     'git-diff-files -p >current'
 cat >expected <<\EOF
-# mode: 100644 100755 path0
+diff --git a/path0 b/path0
+old mode 100644
+new mode 100755
 --- a/path0
 +++ b/path0
 @@ -1,3 +1,3 @@
@@ -34,7 +36,8 @@
  Line 2
 -line 3
 +Line 3
-# mode: 100755 . path1
+diff --git a/path1 b/path1
+deleted file mode 100755
 --- a/path1
 +++ /dev/null
 @@ -1,3 +0,0 @@
------------------------------------------------



^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Matthias Urlichs @ 2005-05-18 16:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <Pine.LNX.4.58.0505180819190.18337@ppc970.osdl.org>

Hi,

Linus Torvalds:
>  It's not three flags "g" "i" and "t",

Or the (hitherto nonexisting, at least in GNU diff) '-g' flag with an
"it" argument. ;-)

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Linus Torvalds @ 2005-05-18 15:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pasky
In-Reply-To: <7v3bslqc94.fsf@assigned-by-dhcp.cox.net>



On Tue, 17 May 2005, Junio C Hamano wrote:
>
> This is just a cover letter but the next patch implements the
> rename detection I told you about.

Ok. I now have one more worry: git-diff-tree with "--stdin".

I actually find that thing to be surpremely useful, and I use it for not 
just my silly git-whatchanged script, but also to make release notes. I 
just do

	git-rev-tree HEAD ^LAST_RELEASE | cut -d' ' -f2 | git-diff-tree -v -s --stdin

and that's wonderful. And sometimes I use "-p" instead of "-s", because 
I'm not making release notes, but because I'm doing a "whatchanged" with 
full diffs.

In other words, try something like this on the kernel tree:

	git-whatchanged -p include/asm-i386 arch/i386 | less -S

and stare in wonder at just how _useful_ this simple thing is.

Maybe it's just that I've not used all that many different SCM systems,
and I've certainly not used all the features of the ones I _have_ used, so
maybe I never knew, but dammit, I've never seen anybody else do something
quite that useful (at least doing it fast enough that it _remains_
useful).

(Replace "arch/i386" with "drivers/usb" or "fs/ext3" or whatever,
depending on just what your area of interest happens to be).

In other words, I'm very happy with git.

However, git-diff-helper doesn't understand these things, and the builtin
diff doesn't do the rename thing. Yet it would be very very useful to do.

Now, if you do just a

	git-whatchanged include/asm-i386 arch/i386

(or you can even use "-z" if you want to), it turns out hat git-diff-tree 
actually does output perfectly usable material. Each "set of diffs" is 
clearly separated, and there is no ambiguos material: all lines are 
either:
 - empty or start with a whitespace
 - start with "diff-tree ", "Author: " or "Date: "
 - are valid input for diff-helper

So what I'd suggest is one (or both) of two possibilities:
 - make the internal diff logic also able to do the same rename handling 
   as the external diff-helper. This may or may not be complex, I've not 
   looked at it.
 - change diff-helper subtly: instead of printing "cannot parse %s", any
   nonrecognized line would be a "ignore this line, but process all
   pending potential renames".

The above would mean that I could either just do

	git-whatchanged include/asm-i386 arch/i386 | git-diff-helper

or continue to use

	git-whatchanged -p include/asm-i386 arch/i386

and still get the "nice" output (and it's a _feature_ that a rename within
the arch/i386 then shows up as a rename, but a rename that crosses the
boundary shows up as a "create" or "delete" event).

Comments? 

			Linus

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox