Git development
 help / color / mirror / Atom feed
* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Petr Baudis @ 2005-05-17 21:11 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.21.0505161955340.30848-100000@iabervon.org>

Dear diary, on Tue, May 17, 2005 at 02:10:35AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Mon, 16 May 2005, Linus Torvalds wrote:
> 
> > One final note: I actually think that "rename patches" make a ton of 
> > sense, even if git itself doesn't track renames. If we ever have a "smart 
> > diff" thing that can generate inter-file diffs, I'd like to eventually see
> > 
> > 	diff -git a/kernel/sched.c b/kernel/sched.c.old
> > 	rename kernel/sched.c kernel/sched.c.old
> > 	old mode 100644
> > 	new mode 100755
> 
> I'd like something like:
> 
> diff -git a/kernel/sched.c b/kernel/sched.c.old
> filename -- kernel/sched.c
> filename ++ kernel/sched.c.old
> mode -- 100644
> mode ++ 100755
> --- a/kernel/sched.c
> +++ b/kernel/sched.c.old
> @@ -1,5 +1,5 @@
> (etc.)
> 
> because I actually start thinking of the two sides as "-" and "+", and I'd
> actually have to think about which is "old" and which is "new", and which
> way the "rename" line goes, and so forth. I'd actually be happier with
> just a "mode -- 100644" line for a deleted file, also. If I'm looking at a
> patch, and I read Makefile with '-' and '+' versions of the lists of
> objects, and then get to a "new file" line, I have to think about it to
> associate the '+' side with having the file and the '-' side with not
> having it.

Oops, I've somehow completely missed this mail, but I like this idea a
lot. What do you think, Linus and Junio?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Petr Baudis @ 2005-05-17 20:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds
In-Reply-To: <7vk6lxfybc.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Tue, May 17, 2005 at 09:27:03PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> >> +int path_ignored(const char *path)
> >> +{
> >> +	if (!verify_path(path))
> >> +		return 1;
> >> +
> >> +	/* Put the Porcelain layer ignore logic here.
> >> +	 * Return non-zero if path is to be ignored.
> >> +	 */
> >> +	return 0;
> >> +}
> 
> PB> I actually think you shouldn't. All the Porcelain layers should
> PB> hopefully use the same git toolkit layer, not each one shipping own due
> PB> to differences in things like this.
> 
> What you said above _is_ exactly my intention.  I phrased that
> comment very badly.  It should have said:
> 
>     /* We _will_ put the "ignore logic Porcelain layers agree upon"
>      * here, once we have a concensus.
>      *
>      * The code should return non-zero if path is to be ignored.
>      */
> 
> I did not put any implementation there because I do not think we
> have agreed upon anything yet.  This patch is to establish
> the framework.  

Ok, so this just bad comment. :-) No problem then.


Regarding having the code in the library, well, I'm thinking about why
not to just put this logic into all the git commands. Unfortunately I
can't find the email with Linus' argumentation against that right now.
:-(

> git-run-with-user-path is useful both in implementing
> porcelain-add if the porcelain's policy is to take filesystem
> paths not GIT paths, like this:

Actually, my doubts about general usefulness of this wrapper are
growing. Cogito is unlikely to ever make use of it since it has to
figure out the .git location anyway for own use (it keeps plenty of own
files there). But that's likely what any other porcelain layer would
have to do as well, isn't it? The wrapper could still be useful for the
standalone users, though.

Another thing is, I don't think git-run-with-user-path is the right name.
I think it doesn't make much sense on its own, and the wrapper is
actually doing more anyway, applying the ignore rules. What about
calling it just git-run-wrapper?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: Darcs-git: a few notes for Git hackers
From: Petr Baudis @ 2005-05-17 20:21 UTC (permalink / raw)
  To: Brad Roberts; +Cc: Juliusz Chroboczek, git
In-Reply-To: <Pine.LNX.4.44.0505151204230.2136-100000@bellevue.puremagic.com>

Dear diary, on Sun, May 15, 2005 at 09:06:27PM CEST, I got a letter
where Brad Roberts <braddr@puremagic.com> told me that...
> > > 2) Should the index changing areas be constructing a new index instead of
> > > shuffling bits within the current index?
> >
> > When I have a big cache (the only time it matters), I do usually only
> > relatively small changes to it, so...
> 
> The entire index is bit shuffled around even if nothing changed.  At least
> today, size and amount changed doesn't matter.

At the very least, it probably shouldn't get shuffled around if there
are no changes whatsoever.

> > I'd imagine the plan of attack to continue by changing active_cache to
> > be struct cache, then making it local.
> 
> Which is what the rest of that patch does.

Well, the important word there was "then". :-) I think the patch is too
big, do you think it would be possible to separate it to those two
stages? (I could do it on my own when I get enough time, but who knows
when that happens... ;-)

Thanks,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: Darcs-git: a few notes for Git hackers
From: Petr Baudis @ 2005-05-17 20:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brad Roberts, Juliusz Chroboczek, git
In-Reply-To: <7voebcrztl.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Sun, May 15, 2005 at 10:36:54PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> The request to _continue_ using good judgements when to forewarn
> people still stands ;-).  In this case, Petr used good
> judgement.  My apologies to both of you if my knee-jerk reaction
> caused you bad feelings.

No problem here. I'm of course going to consult the mailing list on any
larger-scale code changes.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH Cogito] Improve option parsing for cg-log
From: Petr Baudis @ 2005-05-17 20:16 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: GIT Mailing List
In-Reply-To: <1115975134.18499.94.camel@pegasus>

Dear diary, on Fri, May 13, 2005 at 11:05:34AM CEST, I got a letter
where Marcel Holtmann <marcel@holtmann.org> told me that...
> Hi Petr,

Hi,

> > The -r option still must be after all the other options.
> 
> I see what you mean and it seems that I missed that option. Must be
> because you put the list_commit_files() between them and I assumed that
> there is no further option parsing.

Uhm, I know. :-) It sorta evolved like that. We didn't yet settle down
on where to actually put the functions. I'd say right after . cg-Xlib.

> Do you really wanna keep the double meaning of -r. Depending on a
> previous -r it is $log_start or $log_end.

Yes, by all means. On one side I like the colon notation, on the other
side this was always my biggest usability problem with SVN. And it costs
us nothing and does what the user would after all expect, I think.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 0/4] Pulling refs files
From: Petr Baudis @ 2005-05-17 20:14 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Linus Torvalds
In-Reply-To: <Pine.LNX.4.21.0505142306021.30848-100000@iabervon.org>

Dear diary, on Sun, May 15, 2005 at 05:23:18AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Sat, 14 May 2005, Petr Baudis wrote:
> 
> > So what about just something like
> > 
> > 	git-wormhole-pull remote:refs/head/master wormhole://localhost/
> > 
> > That is, you could just specify remote:path_relative_to_url instead of
> > SHA1 id as the commit.
> 
> Do you have any sensible alternatives to "remote:refs/<something>" in
> mind? I suppose that "remote:HEAD" would also work. How are you thinking
> of having the value get written locally?

Anything that gets eventually wound up in the info/ directory. (The name
of the ignore file saved in info/ignore is the current hit.)

> Do you also have some idea for user-invoked rpush? It has to call
> something that writes the value on the other side (and I'd ideally like it
> to do the update atomically and locked against other clients). This series
> uses the same mechanism to write it that it uses to write hashes fetched
> from remote machines.

Well, it'd be again nice to have some generic mechanism for this so that
the user could theoretically push over rsync too or something (although
that'll be even more racy, it is fine for single-user repository).

I think the remote file to write the value inside should be porcelain
business. What you should always check though is that before the pull
(and after the locking) the value in that file is the same as the "push
base". This way you make sure that you are still following a single
branch and in case of multiuser repositories that you were fully merged
before pushing.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH] Make object contents optionally available
From: Daniel Barkalow @ 2005-05-17 19:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git, Petr Baudis
In-Reply-To: <7vbr79fy22.fsf@assigned-by-dhcp.cox.net>

On Tue, 17 May 2005, Junio C Hamano wrote:

> That's what I meant to say by "types are special" but I phrased it so
> badly.

Ah, okay. I'm not convinced that discriminating at all at the point where
parse is deciding whether to free the contents is that useful; doing it
based on type is just sufficiently easy that it could be worthwhile.

> And the callback would solve that naturally.  Or you can
> additionally give the callback the result of the parse_xxx(),
> which would be even more useful.  The callback can then throw
> commit raw-data away based on the date, for example.

I don't think a callback has any advantage over turning on a flag to
provide all unpacked objects, and having the program free them when it
wants to, either right after whatever caused them to be parsed returns
or later. I suspect that the main relevant thing is going to be the stack,
not the data.

I have another idea, which is almost certainly too magical, but which
would work: have library functions that parse objects return them unpacked
if the objects they are given are unpacked. So, for example,
pop_most_recent_commit() would unpack parents of a commit if the commit
was unpacked. This would make everything work the way you'd expect if you
didn't think about it, although it would probably be really surprising if
you were thinking about it. I'm just mentioning this because it might make
someone think of something similar but sane.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: [PATCH] Make object contents optionally available
From: Junio C Hamano @ 2005-05-17 19:32 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, git, Petr Baudis
In-Reply-To: <Pine.LNX.4.21.0505171325150.30848-100000@iabervon.org>

>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:

DB> It wouldn't make any types special; the caller can just control each type
DB> individually, so that code that only cares about commits could get commits
DB> unpacked, but not get trees unpacked even if it has them parsed.

That is exactly the behaviour I am questioning about.  You are
giving the caller the ability to discriminate objects based on
their _type_ (which is fine and better than not having that
control at all).  Why can't the caller discriminate objects
based on their, say, size for example [*1*]?  That's what I
meant to say by "types are special" but I phrased it so badly.
 
And the callback would solve that naturally.  Or you can
additionally give the callback the result of the parse_xxx(),
which would be even more useful.  The callback can then throw
commit raw-data away based on the date, for example.

[Footnote]

*1* So this hypothethical program uses fast cached raw data for
small things but is willing to pay penalty for big things if it
later finds those big things turn out to be needed.


^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Linus Torvalds @ 2005-05-17 19:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7vu0l1fz6p.fsf@assigned-by-dhcp.cox.net>



On Tue, 17 May 2005, Junio C Hamano wrote:
> 
> I've been thinking about doing some rename detection in
> diff-helper for some time.  Here is what that would produce in
> your proposed file format (BTW, wouldn't the earlier patch ready
> for merge already?), if you move file frotz to file nitfol and
> at the same time do some edits:

This has the advantage of working with any old "patch" version, but it has 
the disadvantage of being human-unreadable, and big. 

To me, there really are only two reasons to do rename diffs:
 - smaller diffs
 - human readability (you can actually see what changed)

and if you want to have compatibility with a "patch" program that doesn't
support the feature (like your example), you basically lose both of those
advantages. You have _some_ human-readability, but it basically boils down
to "ignore all those deletes/creates".

So I'd really suggest having just a flag that says "pure old diff format"  
or "new diff format with renames", and if the latter is selected, then do
_just_ the changes, ie the rename+change case would really boil down to
getting just

>     diff -git a/nitfol b/nitfol
>     rename old frotz
>     rename new nitfol
>     --- a/nitfol
>     +++ b/nitfol
>     @@ -1,2 +1,3 @@
>      xyzzy
>      rezrov
>     +gnusto

(except I think it would be nice to have the renamed names show up in the 
"diff" and "---/+++" lines too)

		Linus

^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-17 19:27 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, torvalds
In-Reply-To: <20050517190355.GA7136@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

>> +int path_ignored(const char *path)
>> +{
>> +	if (!verify_path(path))
>> +		return 1;
>> +
>> +	/* Put the Porcelain layer ignore logic here.
>> +	 * Return non-zero if path is to be ignored.
>> +	 */
>> +	return 0;
>> +}

PB> I actually think you shouldn't. All the Porcelain layers should
PB> hopefully use the same git toolkit layer, not each one shipping own due
PB> to differences in things like this.

What you said above _is_ exactly my intention.  I phrased that
comment very badly.  It should have said:

    /* We _will_ put the "ignore logic Porcelain layers agree upon"
     * here, once we have a concensus.
     *
     * The code should return non-zero if path is to be ignored.
     */

I did not put any implementation there because I do not think we
have agreed upon anything yet.  This patch is to establish
the framework.  

The second patch is separate, because it is _my_ version of the
ignore logic proposal, to serve as a sample.  Whatever ignore
logic is agreed upon, that _will_ be in the place you pointed
out and there will be no choice.  Everybody _will_ use the
ignore logic.

>> +/****************************************************************/
>> +
>> +/* Path canonicalization part */

PB> And why is this in the library?

Why not?  It is something other programs would eventually find
useful.

Also the second patch, a sample implementation of ignore logic I
proposed, wants to know GIT_PROJECT_TOP to figure out the file
pointed at by GIT_DIR/.git/info/ignore-file.

Also it would not hurt if you are always running from the
project top and give only verify_path() approved paths.  Then
canon_path would become identity function.

git-run-with-user-path is useful both in implementing
porcelain-add if the porcelain's policy is to take filesystem
paths not GIT paths, like this:

    #!/bin/sh
    # porcelain-add
    exec git-run-with-user-path git-update-cache --add -- -- "$@"

Also if the porcelain's policy is to take GIT paths not
filesystem paths, then users can say:

    $ find . ! -type d -print0 |
      xargs -0 git-run-with-user-path cg-add --

You cannot use both for obvious reasons.


^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Thomas Glanzmann @ 2005-05-17 19:10 UTC (permalink / raw)
  To: git
In-Reply-To: <20050517182232.GM13508@cip.informatik.uni-erlangen.de>

Hi,
okay I got it. Fragmentation:

before:
	(medion) [/scratch/mutt/mutt-cvs] du -sh --apparent-size .git/objects/
	49M     .git/objects/

after:
	(medion) [/scratch/mutt/mutt-cvs] du -sh --apparent-size .git/objects/
	19M     .git/objects/

cvs repository:

	(faui00u) [~/work/git/yagf] du -sh --apparent-size ../../mutt/cvsrepository
	33M     ../../mutt/cvsrepository

Sincerely,
	Thomas

^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Junio C Hamano @ 2005-05-17 19:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0505170812060.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> There's also a real technical reason for this: since the rename format
LT> would not be a valid patch for a traditional "patch" program, and if we
LT> ever want to actually teach "patch" to handle it, we really need to be
LT> explicit. There are tons of traditional patches around that say

LT> 	diff -Nur a/kernel/sched.c.old b/kernel/sched.c
LT> 	--- a/kernel/sched.c.old
LT> 	+++ b/kernel/sched.c
LT> 	...

LT> and clearly the above is _not_ a rename from "sched.c.old" to "sched.c",
LT> so if we want to teach "patch" about the magic git rules, we'd have to
LT> have something unambiguous that a GNU patch maintainer might be willing to
LT> trigger on. The combination of the "diff -git " and "rename" markers might
LT> be such a thing.

LT> So it's a combination of clarity, canonical names, and "patch" issues.

I've been thinking about doing some rename detection in
diff-helper for some time.  Here is what that would produce in
your proposed file format (BTW, wouldn't the earlier patch ready
for merge already?), if you move file frotz to file nitfol and
at the same time do some edits:

    diff -git a/frotz b/frotz
    rename old frotz
    rename new nitfol
    delete file mode 100644
    --- a/frotz
    +++ /dev/null
    @@ -1,2 +0,0 @@
    -xyzzy
    -rezrov
    diff -git a/nitfol b/nitfol
    rename old frotz
    rename new nitfol
    new file mode 100644
    --- /dev/null
    +++ b/nitfol
    @@ -0,0 +1,2 @@
    +xyzzy
    +rezrov
    diff -git a/nitfol b/nitfol
    rename old frotz
    rename new nitfol
    --- a/nitfol
    +++ b/nitfol
    @@ -1,2 +1,3 @@
     xyzzy
     rezrov
    +gnusto

The basic idea is to express the pure rename with traditional
two patches against /dev/null, plus optionally contents patch on
top after pure rename patches.

I am still debating myself where rename lines should be, though.
I cannot decide so I placed them in all three in the above
example.



^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Petr Baudis @ 2005-05-17 19:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds
In-Reply-To: <7voebbpuxs.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Mon, May 16, 2005 at 08:05:19AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> --- a/paths.c
> +++ b/paths.c
> @@ -0,0 +1,199 @@
> +static int initialize_ignore_list(void)
> +{
> +	/* Put the Porcelain layer ignore logic initialization here.
> +	 * Return non-zero after issuing appropriate error message
> +	 * if initialization fails.
> +	 */
> +	return 0;
> +}
> +
> +int path_ignored(const char *path)
> +{
> +	if (!verify_path(path))
> +		return 1;
> +
> +	/* Put the Porcelain layer ignore logic here.
> +	 * Return non-zero if path is to be ignored.
> +	 */
> +	return 0;
> +}

I actually think you shouldn't. All the Porcelain layers should
hopefully use the same git toolkit layer, not each one shipping own due
to differences in things like this.

If we don't agree on something common (implemented in a way to be
still circumventable by a porcelain layer if desired), I wouldn't put
the ignore logic inside at all.

> +/****************************************************************/
> +
> +/* Path canonicalization part */

And why is this in the library?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Thomas Glanzmann @ 2005-05-17 19:02 UTC (permalink / raw)
  To: git
In-Reply-To: <20050517182232.GM13508@cip.informatik.uni-erlangen.de>

Hello,
btw. 6 Megabyte are spend on commit and tree objects:

(medion) [/scratch/mutt/mutt-cvs] find .git/objects/ -type f | sed 's^.*\(..\)/^\1^' | while read FILE; do echo -n "$FILE " ;git-cat-file -t $FILE; done | egrep 'tree|commit' | awk '{print $1}' | sed 's,^\(..\),.git/objects/\1/,' | xargs ls -l | awk '{sum += $5} END {print sum}'
6179105

I know my shell sux.

	Thomas

^ permalink raw reply

* Re: [PATCH] improved delta support for git
From: Thomas Glanzmann @ 2005-05-17 18:22 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.62.0505112309480.5426@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 828 bytes --]

Hello Nicolas,
I just tried it against Linus git-HEAD. It worked like a charm so far. I
imported mutt-1.5 CVS branch using cvsps (1103 patches).

(medion) [/scratch/mutt/mutt-cvs] git-rev-tree HEAD | wc -l
1103
(medion) [/scratch/mutt/mutt-cvs] du -sh .git/objects/
63M     .git/objects/
(medion) [/scratch/mutt/mutt-cvs] git-deltafy-script -d 2000
...
(medion) [/scratch/mutt/mutt-cvs] du -sh .git/objects/
35M     .git/objects/

Maybe you should add git-deltafy-script and git-mkdelta to the installation
targets (patch attached).

And I wonder why the mutt CVS Repository is still smaller than the
zdelta compressed mutt git repository. And with mutt CVS Repository I
mean every commit since mutt-0.9x not only mutt-1.5 branch?

(faui03) [~] du -sh work/mutt/cvsrepository
34M     work/mutt/cvsrepository

Greetings,
	Thomas

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 802 bytes --]

[PATCH] Install git-mkdelta and git-deltafy-script

Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>

--- a/Makefile
+++ b/Makefile
@@ -19,7 +19,7 @@
 INSTALL=install
 
 SCRIPTS=git-apply-patch-script git-merge-one-file-script git-prune-script \
-	git-pull-script git-tag-script git-resolve-script
+	git-pull-script git-tag-script git-resolve-script git-deltafy-script
 
 PROG=   git-update-cache git-diff-files git-init-db git-write-tree \
 	git-read-tree git-commit-tree git-cat-file git-fsck-cache \
@@ -28,7 +28,7 @@
 	git-unpack-file git-export git-diff-cache git-convert-cache \
 	git-http-pull git-rpush git-rpull git-rev-list git-mktag \
 	git-diff-helper git-tar-tree git-local-pull git-write-blob \
-	git-get-tar-commit-id
+	git-get-tar-commit-id git-mkdelta
 
 all: $(PROG)
 

^ permalink raw reply

* Re: [PATCH] Make object contents optionally available
From: Daniel Barkalow @ 2005-05-17 17:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git, Petr Baudis
In-Reply-To: <7vfywlhj3h.fsf@assigned-by-dhcp.cox.net>

On Tue, 17 May 2005, Junio C Hamano wrote:

> >>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:
> 
> DB> I'm already going to add a per-type global to have the parse functions
> DB> also unpack the object contents user-visibly, for the case that Junio
> DB> pointed out. (Making it: parse_* doesn't change whether the contents are
> DB> unpacked, unless you tell it to unpack objects.)
> 
> That sounds better than the patch you sent last night, but are
> we sure that callers would be satisfied if you just make some
> _types_ more special than others?

It wouldn't make any types special; the caller can just control each type
individually, so that code that only cares about commits could get commits
unpacked, but not get trees unpacked even if it has them parsed.

> Parse functions need to unpack anyway, so it might make more
> sense to add an optional callback just after where unpacking
> happens to ask the main program if it wants the unpacked raw
> data to be kept.

I think that a callback is a bit excessive, and it wouldn't get the useful
information anyway, which is really: "what is supposed to happen to this
object, which you haven't seen before, after it's parsed?" 

> So you would do something like:
>     struct object *parse_object(unsigned char *sha1)
>     {

That reminds me that I should also fix the parse_object path, which can
now be simplified substantially.

Note that parse_object calls parse_<type>, rather than the reverse,
because parse_<type> can skip the step of figuring out what routine to
use. The parse_<type> version also has lookup_<type>, which can return the
struct which will be filled out later; this isn't possible for an object
of unknown type, since it can't tell what struct type to allocate.

> Another possibility is not to make double unpacking too costly
> by having an LRU of unpack_sha1_file(), but I am not sure how
> effective that would be.

It would work for pop_most_recent_commit, which parses the thing it's
about to return, but I don't think it's a particularly good solution in
general.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: git-rev-list  in local commit order
From: Linus Torvalds @ 2005-05-17 17:44 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sean, git
In-Reply-To: <1116349507.17296.31.camel@tglx.tec.linutronix.de>



On Tue, 17 May 2005, Thomas Gleixner wrote:
>
> On Tue, 2005-05-17 at 08:43 -0700, Linus Torvalds wrote:
> > > My idea of repository id was not the notion of workspace seperation. I
> > > dont care in which directory and on which machine you or who ever
> > > commits a line of code. I care where the change appears in a public
> > > repository, which is unique.
> > 
> > You seem to think that the repository on master.kernel.org is more 
> > important than the one on my private machine, and you're _wrong_.
> 
> For me yes, as I have no access to your private ones and I can only rely
> on the integrity of the public accessible ones.
> 
> For the individual developer the private workspaces are surely more
> important. I never doubted that, but I do not care whether you use one
> or ten workspaces and which one of them you blow away or use for
> updating of master.kernel.org. 

But how would you track "repositoryness", when the repository you care 
about has absolutely nothing to do with the repositories that any of the 
developers who created it in the first place care about?

See the problem? You can't. You seem to want to track information that
simply does not _exist_.

Put another way: the repository ID of the eventual public "target"  
repository only becomes available once the information has been pushed
there, not before. So a "commit" cannot contain that information, because
at commit time, you fundamentally cannot know what the eventual public
repository (if any) will be.

So the public repo really is nothing but a shadow of the real work, and 
the only reliable ordering you can do will have to depend on local 
information (ie things like the committer "email" value).

Now, what you _can_ do (and what the snapshot mechanism and the commit 
mailing list scripts do) is to create a "publicly visible timeline" thing, 
ie you can at regular intervals generate a snapshot of "what is the state 
of public repo X" and you'll get a "local commit ordering" from that.

But that local commit ordering will fundamentally depend on exactly when
and how often you do the snapshotting and when I (or somebody else)
happened to push to that public repo, so it will inevitably be something
you can never re-create later from just the final repository contents.  
IOW, it's not something that "git-rev-list" can re-create - the only way
to recreate it is literally to build up a separate list of "what was the
head commit at time X" outside of the repository.

		Linus

^ permalink raw reply

* Re: git-rev-list  in local commit order
From: Thomas Gleixner @ 2005-05-17 17:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sean, git
In-Reply-To: <Pine.LNX.4.58.0505170833330.18337@ppc970.osdl.org>

On Tue, 2005-05-17 at 08:43 -0700, Linus Torvalds wrote:
> > My idea of repository id was not the notion of workspace seperation. I
> > dont care in which directory and on which machine you or who ever
> > commits a line of code. I care where the change appears in a public
> > repository, which is unique.
> 
> You seem to think that the repository on master.kernel.org is more 
> important than the one on my private machine, and you're _wrong_.

For me yes, as I have no access to your private ones and I can only rely
on the integrity of the public accessible ones.

For the individual developer the private workspaces are surely more
important. I never doubted that, but I do not care whether you use one
or ten workspaces and which one of them you blow away or use for
updating of master.kernel.org. 

tglx



^ permalink raw reply

* Re: [PATCH] Make object contents optionally available
From: Junio C Hamano @ 2005-05-17 17:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, git, Petr Baudis
In-Reply-To: <Pine.LNX.4.21.0505171130460.30848-100000@iabervon.org>

>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:

DB> I'm already going to add a per-type global to have the parse functions
DB> also unpack the object contents user-visibly, for the case that Junio
DB> pointed out. (Making it: parse_* doesn't change whether the contents are
DB> unpacked, unless you tell it to unpack objects.)

That sounds better than the patch you sent last night, but are
we sure that callers would be satisfied if you just make some
_types_ more special than others?

Parse functions need to unpack anyway, so it might make more
sense to add an optional callback just after where unpacking
happens to ask the main program if it wants the unpacked raw
data to be kept.  So you would do something like:

    struct object *parse_object(unsigned char *sha1)
    {
   ...
                    unsigned long size;
                    void *buffer = unpack_sha1_file(map, mapsize, type, &size);
                    munmap(map, mapsize);
                    if (!buffer)
                            return NULL;
   ...             
                    } else {
                            obj = NULL;
                    }
+		    if (obj && keep_object_raw_data(sha1, type, size, buffer)) {
+                           obj.raw_data = buffer;
+                           obj.raw_size = size;
+                   } else 
                            free(buffer);
                    return obj;
            }
            return NULL;
    }

And put a dummy implementation of keep_object_raw_data() that
says "I do not want anything to be kept" in the libgit.a.  Main
programs that _care_ about raw data can implement their own
keep_object_raw_data() callback that is more intelligent.
In addition you give them the interface to free raw data you
already have.

DB> I think the only likely bug would be unpacking objects after parsing
DB> them, instead of before, which is correct but inefficient. It should be
DB> clear to a user whether the raw contents are available at any point in the
DB> user code.

Another possibility is not to make double unpacking too costly
by having an LRU of unpack_sha1_file(), but I am not sure how
effective that would be.


^ permalink raw reply

* Re: [PATCH] Make object contents optionally available
From: Daniel Barkalow @ 2005-05-17 15:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Petr Baudis, Junio C Hamano
In-Reply-To: <Pine.LNX.4.58.0505170826061.18337@ppc970.osdl.org>

On Tue, 17 May 2005, Linus Torvalds wrote:

> 
> 
> On Tue, 17 May 2005, Daniel Barkalow wrote:
> >
> > This adds contents and size fields to struct object. If unpack_object is
> > called on an object, it will fill in the contents field with the complete
> > raw contents of the object. If free_object_contents is called on an
> > object, the contents will be freed. If contents is filled when an object
> > is parsed, it is not unpacked an extra time, but the contents are not
> > retained if they were not unpacked before parsing.
> 
> I really hate magic interfaces like that. It's just a bug waiting to 
> happen.

I think I described it with too many cases above. parse_*() doesn't change
whether the contents are unpacked. The obvious optimization is performed 
if possible.

I'm already going to add a per-type global to have the parse functions
also unpack the object contents user-visibly, for the case that Junio
pointed out. (Making it: parse_* doesn't change whether the contents are
unpacked, unless you tell it to unpack objects.)

I think the only likely bug would be unpacking objects after parsing
them, instead of before, which is correct but inefficient. It should be
clear to a user whether the raw contents are available at any point in the
user code.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: [RFH] Janitor projects around core GIT
From: Ed L Cashin @ 2005-05-17 15:49 UTC (permalink / raw)
  To: git
In-Reply-To: <4289799A.3040204@pobox.com>

Jeff Garzik <jgarzik@pobox.com> writes:

> Junio C Hamano wrote:
>>  * Rewrite command line parsing code, probably using GNU getopt.
>>    I have three gripes about option parsing in the current code:
>
>
> Use argp.  It supports short and long options, and is highly
> flexible. "info argp" should work on most Linux boxes.

Or "info libc argp" (on my debian sarge system).

-- 
  Ed L Cashin <ecashin@coraid.com>


^ permalink raw reply

* Re: git-rev-list  in local commit order
From: Linus Torvalds @ 2005-05-17 15:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sean, git
In-Reply-To: <1116323520.17296.12.camel@tglx.tec.linutronix.de>



On Tue, 17 May 2005, Thomas Gleixner wrote:
> 
> What you blow away is a work space. But at the end you push the result
> of whatever work space you kept into a public available repository. Also
> BK stores a somewhat hidden repository (not workspace) id.

No.

The public repo is secondary. Really. It has no meaning. The only thing 
that matters is what you call "workspace".

> My idea of repository id was not the notion of workspace seperation. I
> dont care in which directory and on which machine you or who ever
> commits a line of code. I care where the change appears in a public
> repository, which is unique.

You seem to think that the repository on master.kernel.org is more 
important than the one on my private machine, and you're _wrong_.

It's the _private_ repositories that are the important ones. The public 
ones are a communication channel, nothing more. They have no importance on 
their own.

I've blown the public one away several times. With BK, we've had disk
corruption on kernel.org, we've had break-ins on bkbits.net, and we've had
repository corruption due to people editing the SCCS files by hand. Any
number of silly problems, in other words. The result? Blow the public tree
away, restore it from one of the private ones from a machine that you
trust.

I _never_ look at my public tree. I literally have a small script called 
"push-all" in my git repositories, and it does:

	#!/bin/sh
	echo master.kernel.org:
	rsync -av --delete --exclude-from=.exclude .git/ master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
	...

ie it just pushes my stuff to a few other places.

In other words, the public stuff is the _slave_. It has no meaning. The 
only important one is the one that the _developer_ works on.

Of course, this is not to say that everybody needs to take my approach.
The nice thing about distributed systems is that a centralized system is
just a trivial special case of them, so somebody else, who uses git as if
it were CVS, could say "repo xxxx at git-master:/pub/git-root/project is
the 'main' repository, and all the workspaces are just temporary
workspaces".

But from a git _design_ point (and from a kernel usage point), the belief
that a "workspace" is somehow less important than a "central repository"
is just very very very wrong. Each workspace is it's own repository, and 
it's the _local_ ones that matter, not some "central repository".

		Linus

^ permalink raw reply

* Re: [PATCH] Make object contents optionally available
From: Linus Torvalds @ 2005-05-17 15:29 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Petr Baudis, Junio C Hamano
In-Reply-To: <Pine.LNX.4.21.0505170049480.30848-100000@iabervon.org>



On Tue, 17 May 2005, Daniel Barkalow wrote:
>
> This adds contents and size fields to struct object. If unpack_object is
> called on an object, it will fill in the contents field with the complete
> raw contents of the object. If free_object_contents is called on an
> object, the contents will be freed. If contents is filled when an object
> is parsed, it is not unpacked an extra time, but the contents are not
> retained if they were not unpacked before parsing.

I really hate magic interfaces like that. It's just a bug waiting to 
happen.

I'd actually prefer it if "parse_object()" just always unpacks it, and 
leaves the unpacked thing in memory. 

Then, the _few_ programs that really care about memory use because they
parse potentially millions of objects (especially blobs, which obviously
can be very large) can be updated to just do a manual
"free_object_contents()". That's primarily things like fsck and
convert-cache, I suspect.

That would leave a lot less special cases and strange rules..

		Linus

^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Linus Torvalds @ 2005-05-17 15:20 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git
In-Reply-To: <20050517070158.GA10031@pasky.ji.cz>



On Tue, 17 May 2005, Petr Baudis wrote:
> > 
> > 	diff -git a/kernel/sched.c b/kernel/sched.c.old
> > 	rename kernel/sched.c kernel/sched.c.old
> 
> Actually, if the git diff format is fixed, do we even need the explicit
> rename line? It could be enough if the filenames on the diff line would
> be just different. Or you want it because of clarity?

Yes, it's something we can glean from the header itself (or the ---/+++
lines), but I'd prefer it just to make things really obvious. Especially
as all the other pathnames involved (both on the "diff" header line and on
the ---/+++ lines) are in non-canonical -p1 format. So the "rename" line
would be the only one that is actually in canonical form.

There's also a real technical reason for this: since the rename format
would not be a valid patch for a traditional "patch" program, and if we
ever want to actually teach "patch" to handle it, we really need to be
explicit. There are tons of traditional patches around that say

	diff -Nur a/kernel/sched.c.old b/kernel/sched.c
	--- a/kernel/sched.c.old
	+++ b/kernel/sched.c
	...

and clearly the above is _not_ a rename from "sched.c.old" to "sched.c",
so if we want to teach "patch" about the magic git rules, we'd have to
have something unambiguous that a GNU patch maintainer might be willing to
trigger on. The combination of the "diff -git " and "rename" markers might
be such a thing.

So it's a combination of clarity, canonical names, and "patch" issues.

		Linus

^ permalink raw reply

* Name of test directory (was: [RFH] Janitor projects around core GIT)
From: Kevin Smith @ 2005-05-17 12:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vpsvqihkh.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
>  * Extend coverage of tests to more commands in the t/ directory.

Thanks for adding unit tests to the project! I may have missed it, but 
why is the directory named t/ instead of tests/ ?

Kevin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox