Git development
 help / color / mirror / Atom feed
* Re: Git-commits mailing list feed.
From: Paul Jakma @ 2005-04-25  1:35 UTC (permalink / raw)
  To: David A. Wheeler
  Cc: Linus Torvalds, Sean, Thomas Glanzmann, David Woodhouse,
	Jan Dittmer, Greg KH, Kernel Mailing List, Git Mailing List
In-Reply-To: <426C4168.6030008@dwheeler.com>

On Sun, 24 Apr 2005, David A. Wheeler wrote:

> It may be better to have them as simple detached signatures, which 
> are completely separate files (see gpg --detached). Yeah, gpg 
> currently implements detached signatures by repeating what gets 
> signed, which is unfortunate, but the _idea_ is the right one.

Hmm, what do you mean by "repeating what gets signed"?

> Yes, and see my earlier posting.  It'd be easy to store signatures in
> the current objects directory, of course.  The trick is to be able
> to go from signed-object to the signature;

Two ways:

1. An index of sigs to signed-object.

(or more generally: objects to referring-objects)

2. Just give people the URI of the signature, let them (or their
    tools) follow the 'parent' link to the object of interest

> this could be done just by creating a subdirectory using a variant 
> of the name of the signed-object's file, and in that directory 
> store the hash values of the signatures.  E.G.:

> 00/
>    3b128932189018329839019          <- object to sign
>    3b128932189018329839019.d/
>    0143709289032890234323451
> 01/
>    43709289032890234323451          <- signature

You could hack it in to the namespace somehow I guess. I'm not sure 
hacking it in would be a good thing though.

I think it might be more useful just to provide a general index to 
lookup 'referring' objects (if git does not already - I dont think it 
does, but I dont know enough to know for sure). So you could ask 
"which {commit,tag,signature,tree}(s) refer(s) to this object?" - 
that general concept will always work. If you wanted to make the 
implementation of this index use some kind of sub directory as in the 
above, fine..

See also method 2 above. Which would be more efficient for tools if, 
within a project, some developers sign their 'updates' and some 
dont.. (you never need to check whether there's a signature or not - 
you'll know it from the URI automatically).

> There are LOTS of reasons for storing signatures so that they can 
> be checked later on, just like there are lots of reasons for 
> storing old code... they give you evidence that the reputed history 
> is true (and if you doubt it, they give you a way to limit the 
> doubt).

Indeed.

Anyway, we shall see what Linus does. :)

(But I do hope at least that signatures are /not/ included inline 
using BEGIN PGP.. in the object that is signed.)

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
To err is human, to purr feline.
To err is human, two curs canine.
To err is human, to moo bovine.

^ permalink raw reply

* Re: [FILE] Docs update
From: Daniel Barkalow @ 2005-04-25  1:34 UTC (permalink / raw)
  To: David Greaves; +Cc: Petr Baudis, Linus Torvalds, GIT Mailing Lists
In-Reply-To: <426BF790.9070406@dgreaves.com>

The current merge-base finds the common ancestor with the most recent
date. The old algorithm was giving some surprising results, where it
didn't always take advantage of a straight line from one side to the
other. At some point, I'm going to try to have it find the ancestor with
the shortest shorter path, which I think should work best of all.

In any case, I think your documentation should just say it finds as good a
common ancestor as possible, since that's what it's really for, regardless
of the details of how it decides. Also, it shouldn't be depended on to
decide in any particular way.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: Date handling.
From: Russ Allbery @ 2005-04-25  1:32 UTC (permalink / raw)
  To: git
In-Reply-To: <20050425012216.GH29939@delft.aura.cs.cmu.edu>

Jan Harkes <jaharkes@cs.cmu.edu> writes:

> As Russ mentioned, that probably doesn't work with daylight savings
> time. However I did some testing and it looks like the following lines
> around mktime make it work as we would expect.

>     tm.tm_isdst = -1;
>     then = mktime(&tm);
>     then += tm.tm_gmtoff;

> Attached is the program I used to test it, it seems pretty much unfazed
> by changes to the TZ environment variable. Although I tested around a
> daylight savings time switch, I'm still not 100% sure if it doesn't mess
> up in some corner case.

I don't know what sort of portability you're striving for, but many
platforms don't have tm.tm_gmtoff.  But reimplementing mktime from scratch
isn't particularly hard so long as you don't need some of the "extra"
features of mktime (canonicalizing a struct tm or accepting out of range
values and doing the "right thing").

I came in a little late to this discussion, but I gather that the overall
goal here is parsing RFC 2822 dates.  You're all certainly welcome to take
the code that I wrote for INN to do this if you wish, although it parses
the full RFC 2822 syntax and therefore may accept things you consider
insane (comments, newlines, etc.)  Or you're welcome to cherry-pick bits
and pieces out of it (like mktime_utc).  This code has a fairly extensive
test suite and has also been tested against the old INN parsedate function
on ~2M Usenet articles.

All of this code is my own work, and as far as I'm concerned it's in the
public domain or as close of an approximation that one can get to that in
your local legal environment.

The code is largish and needs some Autoconf support, so I won't just send
it to the list unless someone wants it, but let me know if you do.  You
can also get it by downloading INN from:

    <ftp://ftp.isc.org/isc/inn/snapshots/>

(getting the latest CURRENT snapshot) and looking in lib/date.c.  You
don't need the parsedate_nntp stuff, and you probably don't care about
parsedate_rfc2822_lax, which accepts common violations of RFC 2822 syntax
found in Usenet messages.  The test suite is in tests/lib/date-t.c.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

^ permalink raw reply

* Re: Git-commits mailing list feed.
From: David Woodhouse @ 2005-04-25  1:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Dittmer, Greg KH, Kernel Mailing List, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504231010580.2344@ppc970.osdl.org>

On Sat, 2005-04-23 at 10:31 -0700, Linus Torvalds wrote:
> In other words, I actually want to create "tag objects", the same way we 
> have "commit objects". A tag object points to a commit object, but in 
> addition it contains the tag name _and_ the digital signature of whoever 
> created the tag.

I'm slightly concerned that to find a given tag by its name if we do
_just_ the above would be a fairly slow process. I suspect you'll want
a .git/tags/ directory _anyway_, but with named files which refer to tag
objects, instead of directly to commit objects as in Petr's current
implementation.

Other operations we might want to be at least _reasonably_ efficient
would include 'show me the latest tag from Linus' and 'show me all
extant tags'.

-- 
dwmw2


^ permalink raw reply

* Re: Date handling.
From: Jan Harkes @ 2005-04-25  1:22 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, git
In-Reply-To: <1114324729.3419.78.camel@localhost.localdomain>

On Sun, Apr 24, 2005 at 04:38:49PM +1000, David Woodhouse wrote:
> On Sat, 2005-04-23 at 23:04 -0400, Jan Harkes wrote:
> > I noticed that some commit timestamps seemed to be off, looking into it
> > a bit more it seems like mktime is influenced by the setting of the
> > local TZ environment.
> 
> Ewww. I missed that in the documentation. I suppose I should have worked
> it out having empirically determined that it ignores the tm_gmtoff
> field.
> 
> > The question is, do we want to just calculate the time_t offset
> > ourselves without using mktime, or force the TZ environment to UTC.
> 
> I don't think we want to be in the business of counting leap seconds; we
> need to let the system do it. I don't much like setting TZ to UTC though
> -- how about we use your test case to find the offset and subtract that?
> 
> Does this work?

As Russ mentioned, that probably doesn't work with daylight savings
time. However I did some testing and it looks like the following lines
around mktime make it work as we would expect.

    tm.tm_isdst = -1;
    then = mktime(&tm);
    then += tm.tm_gmtoff;

Attached is the program I used to test it, it seems pretty much unfazed
by changes to the TZ environment variable. Although I tested around a
daylight savings time switch, I'm still not 100% sure if it doesn't mess
up in some corner case.

Jan


#include <time.h>
#include <stdlib.h>
#include <stdio.h>

time_t mkutctime(struct tm *tm, int offset)
{
    time_t time;

    /* we don't know whether our timezone happens to be dst or not, let libc
     * figure that one out. */
    tm->tm_isdst = -1;

    /* interpret struct tm in the local timezone */
    time = mktime(tm);
    if (time == -1) return -1;

    /* libc lets us know how many seconds our local time differs from UTC
     * this is a non-standard BSD extension, which is probably not as
     * portable, but it seems to work. */
    time += tm->tm_gmtoff;

    /* However as the passed in struct tm was not UTC but in some other
     * timezone, we still have subtract the offset that came with the
     * RFC2822 date */
    time -= offset;

    return time;
}

int main(int argc, char **argv)
{
    struct tm tm = { 0, };
    time_t time;

    tm.tm_year = 70;
    tm.tm_mday = 1;
    time = mkutctime(&tm, 0);
    printf("1970-01-01 00:00:00 UTC = 0 (%d)\n", time);

    tm.tm_year = 105;
    tm.tm_mon = 2;
    tm.tm_mday = 17;
    tm.tm_hour = 20;
    tm.tm_min = 58;
    tm.tm_sec = 31;
    time = mkutctime(&tm, -5 * 3600);
    printf("2005-03-17 20:58:31 EST = 1111111111 (%d)\n", time);

    tm.tm_mon = 3;
    tm.tm_mday = 3;
    tm.tm_hour = 1;
    tm.tm_min = 59;
    tm.tm_sec = 59;
    time = mkutctime(&tm, -5 * 3600);
    printf("2005-04-03 01:59:59 EST = 1112511599 (%d)\n", time);

    tm.tm_hour = 3;
    tm.tm_min = 0;
    tm.tm_sec = 0;
    time = mkutctime(&tm, -4 * 3600);
    printf("2005-04-03 03:00:00 EDT = 1112511600 (%d)\n", time);
}


^ permalink raw reply

* [PATCH] Show which files was changed in the git log output
From: Jonas Fonseca @ 2005-04-25  1:22 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 510 bytes --]

Hi,

I have attached a patch probe to optionally have the git log command
list which files was changed by a commit.

It would be nice if someone with a more in depth knowledge of which
sha1s to pass to diff-tree when there are multiple parents could comment
on whether it does the right thing.

Right now it does the following: if there are only one parent it is
diffed against the commit tree and if there are two parents they are
diffed. I assume the order they are diffed doesn't matter.

-- 
Jonas Fonseca

[-- Attachment #2: show-changed-files.patch --]
[-- Type: text/plain, Size: 2533 bytes --]

Index: githelp.sh
===================================================================
--- 7de71a831508e51e0985cea173f3f7a7012c82b7/githelp.sh  (mode:100755 sha1:e19176d4ef69a2e7c6da6b9893e2c7cdac24760a)
+++ uncommitted/githelp.sh  (mode:100755)
@@ -19,7 +19,7 @@
 	fork		BNAME BRANCH_DIR [COMMIT_ID]
 	help
 	init		RSYNC_URL
-	log		[-c] [COMMIT_ID | COMMIT_ID:COMMIT_ID]
+	log		[-c] [-f] [COMMIT_ID | COMMIT_ID:COMMIT_ID]
 	ls		[TREE_ID]
 	lsobj		[OBJTYPE]
 	lsremote
Index: gitlog.sh
===================================================================
--- 7de71a831508e51e0985cea173f3f7a7012c82b7/gitlog.sh  (mode:100755 sha1:a09ffda484bd859a3bcb1cffcc8fd6f9c65fa8e7)
+++ uncommitted/gitlog.sh  (mode:100755)
@@ -13,8 +13,11 @@
 #	header		Green	
 #	author 		Cyan
 #	committer	Magenta
+#	files		Blue
 #	signoff		Yellow
 #
+# Takes an -f option to show which files was changed.
+#
 # Takes an id resolving to a commit to start from (HEAD by default),
 # or id1:id2 representing an (id1;id2] range of commits to show.
 
@@ -24,16 +27,25 @@
 	colheader="$(tput setaf 2)"    # Green
 	colauthor="$(tput setaf 6)"    # Cyan
 	colcommitter="$(tput setaf 5)" # Magenta
+	colfiles="$(tput setaf 4)"     # Blue
 	colsignoff="$(tput setaf 3)"   # Yellow
 	coldefault="$(tput op)"        # Restore default
 else
 	colheader=
 	colauthor=
 	colcommitter=
+	colfiles=
 	colsignoff=
 	coldefault=
 fi
 
+if [ "$1" = "-f" ]; then
+	shift
+	list_files=1
+else
+	list_files=
+fi
+
 if echo "$1" | grep -q ':'; then
 	id1=$(commit-id $(echo "$1" | cut -d : -f 1)) || exit 1
 	id2=$(commit-id $(echo "$1" | cut -d : -f 2)) || exit 1
@@ -49,6 +61,8 @@
 
 $revls | $revsort | while read time commit parents; do
 	[ "$revfmt" = "rev-list" ] && commit="$time"
+	tree1=
+	tree2=
 	echo $colheader""commit ${commit%:*} $coldefault;
 	cat-file commit $commit | \
 		while read key rest; do
@@ -73,11 +87,32 @@
 				fi
 				;;
 			"")
+				if [ -n $list_files ]; then
+					sep=
+					echo
+					echo -n "    * $colfiles"
+					diff-tree -r $tree1 $tree2 | \
+					while read modes type sha1s file; do
+						echo -n "$sep$file"
+						sep=", "
+					done
+					echo "$coldefault"
+				fi
 				echo; sed -re '
 					/ *Signed-off-by:.*/Is//'$colsignoff'&'$coldefault'/
 					s/^/    /
 				'
 				;;
+			"tree"|"parent")
+				if [ -z $tree1 ]; then
+					tree1=$rest
+				elif [ -z $tree2 ]; then
+					tree2=$rest
+				else
+					tree1=$rest
+				fi
+				echo $colheader$key $rest $coldefault
+				;;
 			*)
 				echo $colheader$key $rest $coldefault
 				;;

^ permalink raw reply

* [PATCH] Add githelp.sh to Makefile.
From: Steven Cole @ 2005-04-25  1:17 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

With this recent change:
	Separate the git help command to githelp.sh
We now need to install githelp.sh.  Added same to Makefile.

Signed-off-by: Steven Cole <elenstev@mesatop.com>

Index: Makefile
===================================================================
--- 7de71a831508e51e0985cea173f3f7a7012c82b7/Makefile  (mode:100644 sha1:0bbdbee6b6925b64af476de3cebde9b02f9b03ca)
+++ uncommitted/Makefile  (mode:100644)
@@ -36,7 +36,7 @@
 	gitmerge.sh gitpull.sh gitrm.sh gittag.sh gittrack.sh gitexport.sh \
 	gitapply.sh gitcancel.sh gitXlntree.sh gitlsremote.sh \
 	gitfork.sh gitinit.sh gitseek.sh gitstatus.sh gitpatch.sh \
-	gitmerge-file.sh
+	gitmerge-file.sh githelp.sh
 
 COMMON=	read-cache.o
 

^ permalink raw reply

* Re: Git-commits mailing list feed.
From: David A. Wheeler @ 2005-04-25  1:01 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Linus Torvalds, Sean, Thomas Glanzmann, David Woodhouse,
	Jan Dittmer, Greg KH, Kernel Mailing List, Git Mailing List
In-Reply-To: <Pine.LNX.4.62.0504250008370.14200@sheen.jakma.org>




On Sat, 23 Apr 2005, Linus Torvalds wrote:
>> That means that we don't "strip them off", because dammit, they DO NOT
>> EXIST as far as git is concerned. This is why a tag-file will _always_
>> start with
>>
>>     commit <commit-sha1>
>>     tag <tag-name>
>>
>> because that way we can use fsck and validate reachability and have 
>> things that want trees (or commits) take tag-files instead, and git 
>> will automatically look up the associated tree/commit. And it will do 
>> so _without_ having to understand about signing, since signing is for 
>> trust between _people_ not for git.
 >
 >> And that is why I from the very beginning tried to make ti very clear
 >> that the signature goes at the end. Not at the beginning, not in the
 >> middle, and not in a different file. IT GOES AT THE END.

It may be better to have them as simple detached signatures, which are
completely separate files (see gpg --detached).
Yeah, gpg currently implements detached signatures
by repeating what gets signed, which is unfortunate,
but the _idea_ is the right one.


Paul Jakma wrote:
> Ideally, there'd be an index of signature objects by the SHA-1 sum of 
> the object they sign, as the signed object should not refer to the 
> signature (or the second of the above is not possible).

Yes, and see my earlier posting.  It'd be easy to store signatures in
the current objects directory, of course.  The trick is to be able
to go from signed-object to the signature; this could be done
just by creating a subdirectory using a variant of
the name of the signed-object's file, and in that directory store the
hash values of the signatures.  E.G.:
  00/
     3b128932189018329839019          <- object to sign
     3b128932189018329839019.d/
     0143709289032890234323451
  01/
     43709289032890234323451          <- signature

> The latter of the two points would, in combination with the former, 
> allow for cryptographic 'signed-off-by' chains. If a 'commit' is signed 
> by $RANDOM_CONTRIBUTOR and $SUBSYSTEM_MAINTAINER and $ANDREW, you know 
> its time to pull it. Would also work for things like "fixes only" trees, 
> where (say) a change must be approved by X/2+1 of a group of X hacker 
> providing oversight -> looking up the commit object's signatures would 
> tell you whether it was approved.

Right.  Lots of tricks you can do once the signatures are there,
such as checking to counter repository subversion
(did everything get signed), finding out who introduced a malicious
line of code (& "proving" what key signed it first), etc.
There are LOTS of reasons for storing signatures so that they can
be checked later on, just like there are lots of reasons for storing
old code... they give you evidence that the reputed history is true
(and if you doubt it, they give you a way to limit the doubt).

--- David A. Wheeler

^ permalink raw reply

* Re: Where to download gitweb.pl from?
From: Randy.Dunlap @ 2005-04-25  0:47 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: git
In-Reply-To: <20050425003834.GG10806@cip.informatik.uni-erlangen.de>

On Mon, 25 Apr 2005 02:38:34 +0200 Thomas Glanzmann wrote:

| Hello,
| I am looking for the current sources of gitweb.pl, anyone?

ftp://ehlo.org/ ...

ftp://ehlo.org/gitweb.pl

---
~Randy

^ permalink raw reply

* Re: Where to download gitweb.pl from?
From: Christian Meder @ 2005-04-25  0:45 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: GIT
In-Reply-To: <20050425003834.GG10806@cip.informatik.uni-erlangen.de>

On Mon, 2005-04-25 at 02:38 +0200, Thomas Glanzmann wrote:
> Hello,
> I am looking for the current sources of gitweb.pl, anyone?

ftp://ehlo.org/gitweb.pl


			Christian
-- 
Christian Meder, email: chris@absolutegiganten.org

The Way-Seeking Mind of a tenzo is actualized 
by rolling up your sleeves.

                (Eihei Dogen Zenji)


^ permalink raw reply

* Where to download gitweb.pl from?
From: Thomas Glanzmann @ 2005-04-25  0:38 UTC (permalink / raw)
  To: GIT

Hello,
I am looking for the current sources of gitweb.pl, anyone?

Greetings,
	Thomas

^ permalink raw reply

* keyword expansion
From: Thomas Glanzmann @ 2005-04-25  0:23 UTC (permalink / raw)
  To: GIT

Hello,
I am aware that keyword expansion is at the moment at the very bottom of
the todo list. However I need it. Has someone something ready to use? I
am looking for the following informations:

	- Time stamp of the last modification of a file
	- last Committer/Author of the file

What I want is a script which runs after an export that checks for
keywords in files and expands them using informations extracted out of
the tree.  I would be gratefull for any pointers/shell snippsets.

I just migrated my mutt vendor tracking tree to git and it works quiet
well. Thanks for all the effort!

Greetings,
	Thomas

^ permalink raw reply

* Re: [PATCH] PPC assembly implementation of SHA1
From: linux @ 2005-04-25  0:16 UTC (permalink / raw)
  To: paulus; +Cc: git, linux
In-Reply-To: <17003.9009.226712.220822@cargo.ozlabs.ibm.com>

> Yes. :)  In previous experiments (in the context of trying different
> ways to do memcpy) I found that doing unaligned word loads is faster
> than doing aligned loads plus extra rotate and mask instructions to
> get the bytes you want together.

The PPC970, at least, supports unaligned loads within one cache line
(64 bytes for L1 hit; 32 bytes for L1 miss) directly.  If the load
crosses the line, the processor backs up and re-issues it as two
loads and a merge.

Multiple-word loads can really suffer from this, as when the fault
hits, the *entire instruction* is aborted and re-issued as a series
of aligned loads and merges.

But for a single load, it's probably cheaper on average to use the
hardware 15 times out of 16 and take the retry the 16th.

> But I came up with a few additional refinements:
> 
>> - You are using three temporaries (%r0, %r6, and RT(x)) for your
>>   round functions.  You only need one temporary (%r0) for all the functions.
>>   (Plus %r15 for k)

> The reason I used more than one temporary is that I was trying to put
> dependent instructions as far apart as reasonably possible, to
> minimize the chances of pipeline stalls.  Given that the 970 does
> register renaming and out-of-order execution, I don't know how
> essential that is, but it can't hurt.

It's a good general idea, but the PPC970 only has two integer ALUs, so
it can't get too clever.

>> All are three logical instrunctions on PPC.  The second form
>> lets you add it into the accumulator e in two pieces:

> A sequence of adds into a single register is going to incur the
> 2-cycle latency between generation and use of a value; i.e. the adds
> will only issue on every second cycle.  I think we are better off
> making the dataflow more like a tree than a linear chain where
> possible.

Grumble, complain... you're right.  I didn't know it had a 2-cycle
dependency.  Time to reschedule those inner loops.  Still, the multi-input
sum representation gives you a lot of scheduling flexibility.

Given this, it has to be scheduled 4-wide, which is a lot trickier.
The steps are 9, 8, and 10 instructions long (plus 4 instructions
for UPDATEW on most of them), and there's a 5- or 6-input sum to
compute in that time.

I'll stare at the dependency graph a bit and see if I can do better.

>> - You don't need to decrement %r1 before saving registers.
>>   The PPC calling convention defines a "red zone" below the
>>   current stack pointer that is guaranteed never to be touched
>>   by signal handlers or the like.  This is specifically for
>>   leaf procedure optimization, and is at least 224 bytes.

> Not in the ppc32 ELF ABI - you are not supposed to touch memory below
> the stack pointer.  The kernel is more forgiving than that, and in
> fact you can currently use the red zone without anything bad
> happening, but you really shouldn't.

Oh!  I didn't know that!  Thank you for enlightening me!

>> - Is that many stw/lwz instructions faster than stmw/lmw?
>>   The latter is at least more cache-friendly.

> I believe the stw/lwz and the stmw/lmw will actually execute at the
> same speed on the 970, but I have seen lwz/stw go faster than lmw/stmw
> on other machines.  In any case we aren't executing the prolog and
> epilog as often as the instructions in the main loop, hopefully.

Yes, and reducing the I-cache footprint seems useful.

>> With all of the above changes, your sha1ppc.S file turns into:

> I added a stwu and an addi to make a stack frame, and changed %r15 to
> %r5 as you mentioned in another message.  I tried it in a little test
> program I have that calls SHA1_Update 256,000 times with a buffer of
> 4096 zero bytes, i.e. it processes 1000MB.  Your version seems to be
> about 2% faster; it took 4.53 seconds compared to 4.62 for mine.  But
> it also gives the wrong answer; I haven't investigated why.

Grumble, damn.  The standard document has all the register values for
every round. so if you set up the same plaintext they use and step
through the code with that at your side, the problem should jump
out at you.

Anyway, now that I know the pipeline issues better, I'll try to
reschedule it and see if I can improve the situation.  I may have
to understand the dispatch group issues pretty thoroughly, too.

http://www.alphaworks.ibm.com/tech/simppc
might be of interest.

I'll try to find the bug while I'm at it.  Would you be willing to
benchmark some code for me?

Thanks!

^ permalink raw reply

* Re: Git-commits mailing list feed.
From: Paul Jakma @ 2005-04-24 23:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sean, Thomas Glanzmann, David Woodhouse, Jan Dittmer, Greg KH,
	Kernel Mailing List, Git Mailing List
In-Reply-To: <Pine.LNX.4.62.0504250008370.14200@sheen.jakma.org>

On Mon, 25 Apr 2005, Paul Jakma wrote:

> Ideally, there'd be an index of signature objects by the SHA-1 sum of the 
> object they sign, as the signed object should not refer to the signature (or 
> the second of the above is not possible).

Ah, this could (obviously) be done generally by providing a general 
index of 'referals' (if desirable).

I have no idea whether git already does this, I havn't checked it out 
yet but I'm very interested to see how git will mature and have been 
trying to follow its progress - I'm a frustrated admin of a CVS 
repository..

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Does the name Pavlov ring a bell?

^ permalink raw reply

* Re: Git-commits mailing list feed.
From: Paul Jakma @ 2005-04-24 23:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sean, Thomas Glanzmann, David Woodhouse, Jan Dittmer, Greg KH,
	Kernel Mailing List, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504231234550.2344@ppc970.osdl.org>

On Sat, 23 Apr 2005, Linus Torvalds wrote:

> NO.
>
> Guys, I will say this once more: git will not look at the signature.
>
> That means that we don't "strip them off", because dammit, they DO NOT
> EXIST as far as git is concerned. This is why a tag-file will _always_
> start with
>
> 	commit <commit-sha1>
> 	tag <tag-name>
>
> because that way we can use fsck and validate reachability and have 
> things that want trees (or commits) take tag-files instead, and git 
> will automatically look up the associated tree/commit. And it will 
> do so _without_ having to understand about signing, since signing 
> is for trust between _people_ not for git.

> And that is why I from the very beginning tried to make ti very 
> clear that the signature goes at the end. Not at the beginning, not 
> in the middle, and not in a different file. IT GOES AT THE END.

Actually, can you make the signature be detached and a seperate 
object? Ie, add a signature object in its own right, distinct from 
tag. They could then:

- be used to sign any kind of object
- allow objects to be signed by multiple people

Ideally, there'd be an index of signature objects by the SHA-1 sum of 
the object they sign, as the signed object should not refer to the 
signature (or the second of the above is not possible).

The latter of the two points would, in combination with the former, 
allow for cryptographic 'signed-off-by' chains. If a 'commit' is 
signed by $RANDOM_CONTRIBUTOR and $SUBSYSTEM_MAINTAINER and $ANDREW, 
you know its time to pull it. Would also work for things like "fixes 
only" trees, where (say) a change must be approved by X/2+1 of a 
group of X hacker providing oversight -> looking up the commit 
object's signatures would tell you whether it was approved.

No idea whether this is possible or practical. :) But it would be 
good for future flexibility to avoid including the signature in the 
object being signed.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
You give me space to belong to myself yet without separating me 
from your own life.  May it all turn out to your happiness.
 		-- Goethe

^ permalink raw reply

* Re: Hash collision count
From: linux @ 2005-04-24 23:16 UTC (permalink / raw)
  To: jgarzik; +Cc: git

*Sigh*.

> A collision -will- occur eventually, and it is trivial to avoid this 
> problem:

Yes, it will occur *eventually*.  Let's put some numbers on this
"eventually" business.

The earth's sun will run out of hydrogen and become a red giant in about
6 billion years, 1.3 * 2^57 seconds.  (Many people say 5 billion. but
I'll round up for safety.)  Suppose we add 1 file per second to our
repository until we are interrupted by the sun's photosphere engulfing
the planet and melting all our computers.

That's a total of n*(n-1)/2 = 1.7 * 2^113 pairs of files which can
possibly collide.  I'll round up to 2^114 for simplicity.

Assuming we have a good uniform hash function, the chance that any given
pair of different file versions produces an identical hash is 1/2^160.

The chance that there are no collsions is (1-1/2^160)^(2^114).  What is
this numerically?  Well, (1-a)*(1-b) = 1 - a - b + a*b.  When multiplying
probabilities very close to 1 (a and b small), the probability is
a little bit larger than (1-(a+b)).

Likewise, when exponentiating proabilities very close to 1, (1-a)^n is
a bit larger than 1-n*a.

Thus, the probability of no collisions is a bit larger than (1 - 2^114/2^160),
or (1 - 2^-46).  The probability of one or more collisions is a bit *less*
than 2^-46, 1 in 70 trillion.


With odds like that, I submit that it's not worth fixing; the odds
of introducing a bug are far higher than that.


If you can't find a hard drive large enough to store 183 quadrillion
file versions, the probability of a collision decreases as the square of
the fraction of that number you do store.  For example, if you only have
400 billion versions, the chance of a collision is around 2^-84.

We have a fair bit of safety margin in our "good uniform hash" assumption.

Running out of 32-bit inode number space is a *far* more urgent problem.

Not to mention that over the next 1000 years of kernel maintainers, one
of them may find something they like better than git.

^ permalink raw reply

* Re: [RFC] Design of name-addressed data portion
From: Daniel Barkalow @ 2005-04-24 23:12 UTC (permalink / raw)
  To: Fabian Franz; +Cc: git, Linus Torvalds, Petr Baudis
In-Reply-To: <200504250058.15901.FabianFranz@gmx.de>

On Mon, 25 Apr 2005, Fabian Franz wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Am Sonntag, 24. April 2005 20:17 schrieb Daniel Barkalow:
> > I'd propose the following structure:
> >
> > [...]
> >    tags/     the tags
> >      ...     files with the symbolic name of the tags, containing the hash
> 
> Couldn't you use symbolic or hard links here and in references/?

For most uses of the refs/ directory (of which tags/ is a subdirectory),
we want to get from it the hash, not just the contents of the referenced
object, and we potentially want to get the hash from something like a web
server. Finding out what http://.../foo.git/refs/heads/DEFAULT is a
symlink (or, wrose, hard link) to so that you can decide if it's different
from what you have would be a major pain.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: git.git object database at kernel.org?
From: Linus Torvalds @ 2005-04-24 23:08 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano
In-Reply-To: <7vhdhvstb2.fsf@assigned-by-dhcp.cox.net>


Junio just pointed out that "convert-cache" didn't actually handle some of 
the old-style commits that git itself has (notably, the date changes).

I fixed that, and converted git, so that the dates are now correct in "git
log" for the early commit entries too.

This means that all the commit objects in my git tree ended up being
re-generated: the data itself doesn't change, and none of the tree or blob
objects are any different, but since the first "commit" changed (the first
several ones, in fact - it was about a week until the date format got
fixed), the whole chain of commits ends up being different.

That has almost zero impact _except_ for anybody who merges directly with 
my tree using git itself. You now have two choices:

 - convert your own git tree (probably a good thing, otherwise your old 
   commit entries will always be bogus), at which point it should just 
   merge fine with mine automatically.

   NOTE! The fact that "mktime()" seems to depend on the timezone in which 
   it is made seems to make this questionable. I had always assumed that 
   mktime would take the timezone from the "struct tm", and thus be 
   reliable, but somebody seems to have shown that that is not the case at 
   all!

   It's entirely possible that you need to do something stupid like

	TZ=US/Pacific

   before you convert your tree. I wonder if this might also explain the 
   problems some people (notably Russell) had at around the conversion..

   Anyway, _if_ your conversion was successful and matches mine, you
   should now have a root "e83c5163316f89bfbde7d9ab23ca2e25604af290", that
   is reachable from the result of the conversion (check with "git log
   <result>"). You should also have as a top (unless you have made changes 
   of your own):

	commit 3f053897b3445988309d0ae7378944783c34d152
	tree f5c350ae39f61486622c84597a507611e62fa6af
	parent c6e007b0942a373bbf87fa3e4e11e2d90907de8c
	author Linus Torvalds <torvalds@ppc970.osdl.org> Sun Apr 24 22:49:09  2005
	committer Linus Torvalds <torvalds@ppc970.osdl.org> Sun Apr 24 22:49:09  2005
	
	  Update "convert-cache" to handle git itself.
	  
	  The git archives have some old-date-format commits with timezones
	  that the converter didn't recognize. Also, make it be quiet about
	  already-converted dates.

   If it doesn't match that, you can run "convert-cache" (with the
   _original_ head) several times. When you are happy that it's all ok,
   you can "commit" the result by writing it to your .git/HEAD file.

 - alternatively, if you don't want to convert your thing, you can give a 
   "base commit" by hand when merging, and select one where the "tree"  
   object matches in both mine and yours. This isn't something I would 
   recommend, but it should work and "meld" the two commit chains together 
   even though their root entries differ.

Sorry about all this, I had totally forgotten that we had old-style dates 
in our git commit history.

			Linus

^ permalink raw reply

* Re: [RFC] Design of name-addressed data portion
From: Fabian Franz @ 2005-04-24 22:58 UTC (permalink / raw)
  To: Daniel Barkalow, git; +Cc: Linus Torvalds, Petr Baudis
In-Reply-To: <Pine.LNX.4.21.0504241336250.30848-100000@iabervon.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am Sonntag, 24. April 2005 20:17 schrieb Daniel Barkalow:
> I'd propose the following structure:
>
> [...]
>    tags/     the tags
>      ...     files with the symbolic name of the tags, containing the hash

Couldn't you use symbolic or hard links here and in references/?

cu

Fabian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCbCSHI0lSH7CXz7MRAmPDAJ95YVHaGWH3KIMhOrw035cAUZd+QgCfZqFa
8IAfnNgc8P6cx+W2+xNJ0P0=
=WGC/
-----END PGP SIGNATURE-----


^ permalink raw reply

* A darcs that can pull from git
From: Juliusz Chroboczek @ 2005-04-24 22:32 UTC (permalink / raw)
  To: darcs-devel; +Cc: Git Mailing List

I've just finished putting together a hack for darcs to allow it to
pull from Git repositories.  You'll find the patch (Darcs patch, not
diff patch) on

  http://www.pps.jussieu.fr/~jch/software/files/darcs-git-20050424.darcs

You should get yourself a copy of darcs-unstable, then apply this
patch:

  $ darcs get http://www.abridgegame.org/repos/darcs-unstable darcs-git
  $ cd darcs-git
  $ darcs apply darcs-git-20050424.darcs
  $ make darcs

If you get merge conflicts, try using a version of the darcs-unstable
tree from 18.04.2005, which is what I started with.

A minor problem: there's something broken with the build procedure;
you'll probably need to manually do a ``make Context.hs'' followed
with ``make darcs'' when the build breaks.

After you build darcs-git, you should be able to do something like

  $ cd ..
  $ mkdir a
  $ cd a
  $ darcs initialize
  $ ../darcs-git/darcs pull /usr/local/src/git-pasky-0.4
  $ darcs changes

This version can *pull* from git, but it cannot push; in other words,
the only way to export your data from Darcs back to git is to use diff
and patch.

Please be aware that this is just a proof-of-concept prototype.  David
and the rest of the Central Committee haven't looked at this code yet;
it is quite likely that future versions of Darcs will generate
completely different patches from git repositories.  It is also likely
that THIS CODE WILL EAT YOUR DATA.

The major issue is that we generate no patch dependencies.  If you try
to cherry-pick from repositories generated with this version, you'd
better know what you're doing.

David, could you please have a look at the patches

  Sun Apr 24 16:50:02 CEST 2005  Juliusz Chroboczek <jch@pps.jussieu.fr>
    * First cut at remodularising repo access.

  Sun Apr 24 16:01:32 CEST 2005  Juliusz Chroboczek <jch@pps.jussieu.fr>
    * Change Repository to DarcsRepo.

and tell me whether this sort of restructuring is okay with you.

(David, I'm not claiming that this scheme is better than the ``tagging
like crazy'' scheme that you outlined; I'm only trying to prove that
my scheme is workable.)

Right now, I'm taking a Git commit and manually generating a Darcs
patch id from that, which is a bad idea.  A better way would be to get
Darcs to deal with arbitrarily shaped patch ids; a patch that
originates with git would get the git patch id, while a patch that
comes from Darcs would retain its patch id even when pushed to git.
David, you had some objections to that; any chance we could discuss
the issue?

This is slow.  There are a few obvious improvements to make to the
performance, but I'd rather first implement whatsnew, diff and apply,
and fix the problem with patch dependencies.  (Whatsnew is where git's
performance is actually likely to be better than Darcs, but it will
require some abstracting of ``Slurpy'' in order to make that
effective.)  Unfortunately, I don't expect to have hacking time before
next week-end.


Enjoy,

                                        Juliusz Chroboczek

^ permalink raw reply

* Re: First web interface and service API draft
From: Christian Meder @ 2005-04-24 22:29 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050422225733.GH21204@pasky.ji.cz>

On Sat, 2005-04-23 at 00:57 +0200, Petr Baudis wrote:
> Dear diary, on Fri, Apr 22, 2005 at 03:29:39PM CEST, I got a letter
> where Christian Meder <chris@absolutegiganten.org> told me that...
> > > > /<project>
> > > > 
> > > > Ok. The URI should start by stating the project name
> > > > e.g. /linux-2.6. This does bloat the URI slightly but I don't think
> > > > that we want to have one root namespace per git archive in the long
> > > > run. Additionally you can always put rewriting or redirecting rules at
> > > > the root level for additional convenience when there's an obvious
> > > > default project.
> > > > 
> > > > Should provide some meta data, stats, etc. if available.
> > > 
> > > I don't think this makes much sense. I think you should just apply -p1
> > > to all the directories, and define that there should be some / page
> > > which should contain some metadata regarding the repository you are
> > > accessing (probably branches, tags, and such).
> > 
> > Hi,
> 
> Hi,
> 
> > remember that I want to stay stateless as long as possible so everything
> > important has to be encoded in the url. So somewhere in the url the git
> > archive to show has to be encoded. If I remove the <project> portion how
> > do I know on the server side which repo to show ?
> 
> since you are configured appropriately.
> 
> You need to be anyway. Someone needs to tell you or your web server
> "this lives at http://pasky.or.cz/wit/". So you bind "this" to the
> given repository.
> 
> No problem with an additional configuration possibility to say "at that
> place, clone your life place for the given repositories", but if I want
> to have just a single repository at a given URL, it should be possible.
> 
> I'm just trying to argue that having it _forced_ to have <project> as
> the part of the URL is useless; this is matter of configuration.

Ok. Got it. <project> for a multi-repo setup and in the simple case of
just one repo <project> can be dropped from the url. Reasonable.

> > > > * Blob data should be probably binary ?
> > > 
> > > What do you mean by binary?
> > 
> > content-type: binary/octet-stream
> 
> Ah. So just as-is, you mean?

Yes.

> 
> > > Anything wrong with putting ls-tree output there?
> > 
> > ls-tree output should be in .html (see below)
> 
> What if I actually want to process it by a script?

Use the .html variant and parse it. Or we add a .txt and/or .xml for
easier parsing.

> 
> > > > -------
> > > > /<project>/tree/<tree-sha1>
> > > > 
> > > > Tree objects are served in binary form. Primary audience are scripts,
> > > > etc. Human beings will probably get a heart attack when they
> > > > accidentally visit this URI.
> > > 
> > > Binary form is unusable for scripts.
> > 
> > Why should it be unusable for a downloading script. It's just the raw
> > git object.
> > 
> > > We should also have /gitobj/<sha1> for fetching the raw git objects.
> > 
> > Everything above is supposed to be raw git objects. No special encoding
> > whatever.
> 
> You have a consistency problem here.
> 
> Raw git objects as in database contain the leading object type at the
> start, then possibly some more stuff, then '\0' and then compressed
> binary stuff. You mean you are exporting _this_ stuff through this?
> 
> That's not very useful except for http-pull, if you as me. It also does
> not blend well with the fact that you say commits are in text or so.

Ok. We spoke of two different things. With raw objects I meant the
uncompressed raw content while you spoke of the raw compressed git
objects. Ok I'm dumb but now that I've understood what you said I agree
with you: we need one generic url for fetching compressed objects.

> 
> > > > -------
> > > > /<project>/tree/<tree-sha1>/diff/<ancestor-tree-sha1>/html
> > > > 
> > > > Non recursive HTML view of the objects which are contained in the diff
> > > > fully linked with the individual HTML views.
> > > 
> > > Why not .html?
> > 
> > I think .html isn't very clear because it would
> > be ..../<ancestor-tree-sha1>.html which somehow looks like it has
> > anything to do with the ancestor-tree. But it's the html version of the
> > _diff_ and not the ancestor-tree.
> 
> Perhaps /tree/<sha1>.html/diff/<ancestor> ?
> 
> I'd lend to ?diff=<ancestor> more and more. The path part of URI is
> there to express _hierarchy_, I think you are abusing that when there is
> no hierarchy.

But I'd argue that you are abusing queries ;-)
After all any given URI of the above kind is linking a specific diff
resource. It's a completely static resource from a user POV. The fact
that the server is probably dynamically generating it is just an
implementation detail.

> 
> > > For consistency, I'd stay with the plaintext output by default, .html if
> > > requested.
> > 
> > Remember that I'm just sitting on top of git and not git-pasky right
> > now. So there's no canonical changelog plaintext output for me. But I'm
> > not religious about that.
> 
> But there is canonical HTML output for you? ;-)

No. Changelog isn't defined by git so there's no canonical output of any
flavour.

> > > OTOH, I'd use
> > > 
> > > 	/log/<commit>
> > > 
> > > to specify what commit to start at. It just does not make sense
> > > otherwise, you would not know where to start
> > 
> > Start for the changelog is always head, but I guess that's pretty
> > standard. With git log you always start at the head too.
> 
> If you are sitting on top of git and not git-pasky, you have no assured
> HEAD information at all.

I've got HEAD. I'm still watching the discussion of tags.

> > If you want to start at a specific commit. Why not start
> > at /linux-2.6/commit/<sha1>.html ?
> 
> And how does that give me the changelog?

You could click through the commit chain interactively or we could add a
changelog from here function.
 
> > > I think the <commit> should follow the same or similar rules as Cogito
> > > id decoding. E.g. to get latest Linus' changelog, you'd do
> > > 
> > > 	/log/linus
> > 
> > Like I said above I think the shown head should be encoded in the
> > project id.
> 
> I thought the project was mapped to repository? But I might just have
> blindly assumed that. ;-) (That does not make me like your approach
> more, though.)

Ok. I think I misunderstood you here. You want to publish the different
heads you are tracking with the same repo, right ?

The proposal didn't account for this scenario yet. I'll think about it.



				Christian

-- 
Christian Meder, email: chris@absolutegiganten.org

The Way-Seeking Mind of a tenzo is actualized 
by rolling up your sleeves.

                (Eihei Dogen Zenji)


^ permalink raw reply

* Whales falling on houses - was: Hash collision count
From: Jon Seymour @ 2005-04-24 22:25 UTC (permalink / raw)
  To: Imre Simon
  Cc: Jeff Garzik, Petr Baudis, Ray Heasman, Git Mailing List,
	Linus Torvalds, Imre Simon
In-Reply-To: <68ff9fa6050424142416fbadcd@mail.gmail.com>

.> 
> 1. Take your favorite text file, at least 160 characters long.
> 2. Choose 160 positions in this file.
> 3. For each position choose your favorite mispelling of that character.
> 4. Produce all 2^160 text files, all of the same length, choosing for
> each position either the original or the alternate character
> 5. Add an arbitrary file of the same length, different from the above
> 
> Two of these files have the same sha1 hash. Or, for that matter, for
> any 160 bit  hash the same is true.

If you were to create those files at 10^9 files per second, it would
take you 10^38 years before you were in position to take step 5. I am
about to turn 38 this week. Would that I could live to 10^38.

It's absolute rubbish to say that the best solution from an
<double-quote>engineering</double-quote> point of view is to eliminate
the infinitessimal possibility of a collision. Engineering is all
about assessing risk and making suitable trade-offs. Every day of the
week, "real" engineers accept life-threatening risks that put
thousands of peoples lives in danger. They do it because we live in a
world where risk cannot be eliminated, merely reduced to an acceptable
level.

I can't understand that you are a prepared to drive a car or fly in a
Boeing or Airbus that has a demonstrated risk of killing you, yet you
want to insist on eliminating a risk that at most might create an
interesting Slashdot headline: "Jolt-crazed programmer finds SHA1
collision - but later dies when whale falls on house".

jon.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply

* Re: unseeking?
From: Daniel Barkalow @ 2005-04-24 22:10 UTC (permalink / raw)
  To: Zack Brown; +Cc: Petr Baudis, git
In-Reply-To: <20050424213841.GD11094@tumblerings.org>

On Sun, 24 Apr 2005, Zack Brown wrote:

> On Sun, Apr 24, 2005 at 02:47:30PM -0400, Daniel Barkalow wrote:
> 
> So why not just do 'git init URL' to get the upstream sources, make your
> edits, do 'git pull' to track the upstream sources every once in
> awhile, and do 'git diff' when you're ready to send your changes to the
> upstream maintainer.
> 
> I think I've understood your explanation of what's actually happening,
> but I still don't see its significance. What do you get from a fork that
> you don't get from a regular old init and pull?

Primarily, the ability to inspect and build the mainline tree. If you want
to take a look at what's going on in the mainline without getting it again
or messing with your working directory or local commits, you can do that. 

Also, if you're doing two independant sets of edits, you can share the
downloads for updates between them. Say I'm working on an ambitious
project to do block-move cross-file merges in git. I've got a fork that
I'm working on that in. After I've done a bunch of work there, I notice a
bug report about some of my other code in the project. I fork off another
branch from the mainline to fix it in, so that I can ignore the fact that
I'm a dozen commits into this other thing, fix the bug, and ship off the
changes.

With fork, I save having to download the contents of the remote repository
again, because the object storage is shared. Also, I can merge my bug 
fixes into my long-term work without waiting for them to show up in the
mainline (although that makes the later merge potentially trickier) or, in
general, needing to transfer them between repositories.

The other main thing is the way that I actually split up patches. I have a
fork of the mainline. I make a second fork of the mainline. I diff
"second:first" to get the changes I need to split up, apply some of them,
commit, and then repeat. In order to diff "second:first", both have to be
stored in the same repository (because, otherwise, git won't be able to
find one or the other commit to look at). In the first iteration, when
second is mainline, it doesn't matter, but in later iterations I want to
get fewer and fewer changes to include or postpone, which requires using
commits from the splitting process.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: [FILE] Docs update
From: Linus Torvalds @ 2005-04-24 21:44 UTC (permalink / raw)
  To: David Greaves; +Cc: Petr Baudis, GIT Mailing Lists
In-Reply-To: <426BF790.9070406@dgreaves.com>



On Sun, 24 Apr 2005, David Greaves wrote:
> 
> And I've attached this as a file rather than a patch to make it easier 
> for people to read.

Suggestion: move "diff-tree" up above "diff-cache", since as it is now, 
you explain "diff-cache" in terms of diff-tree, before you've even 
explained diff-tree in the first place.

Also, the current diff-tree has an extension:

	################################################################
	diff-tree
	        diff-tree [-r] [-z] <tree/commit> <tree/commit> [pattern]*

	Compares the content and mode of the blobs found via two tree objects.

where the "pattern" arguments are the pathnames you are interested in 
seeing the differences of.

For example, if you're only interested in differences in some
architecture-specific files, you might do

	diff-tree -r <tree/commit> <tree/commit> arch/ia64 include/asm-ia64

and it will only show you what changed in those two directories.

Or if you are searching for what changed in just kernel/sched.c, just do

	diff-tree -r <tree/commit> <tree/commit> kernel/sched.c

and it will ignore all differences to other files.

The pattern is always the prefix, and is matched exactly (ie there are no
wildcards - although matching a directory, which it does support, can
obviously be seen as a "wildcard" for all the files under that directory).

		Linus

^ permalink raw reply

* Re: unseeking?
From: Zack Brown @ 2005-04-24 21:38 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.21.0504241418190.30848-100000@iabervon.org>

On Sun, Apr 24, 2005 at 02:47:30PM -0400, Daniel Barkalow wrote:
> On Sun, 24 Apr 2005, Zack Brown wrote:
> > 4) In normal work-flow, when would forks be created, as opposed to other ways
> > of getting a tree?
> 
> I have a tree that I want to modify, but I want to keep the original, and
> I may want to update the original from an upstream source (and then sync
> my work with it).

So why not just do 'git init URL' to get the upstream sources, make your edits,
do 'git pull' to track the upstream sources every once in awhile, and do 'git
diff' when you're ready to send your changes to the upstream maintainer.

I think I've understood your explanation of what's actually happening, but I
still don't see its significance. What do you get from a fork that you don't get
from a regular old init and pull?

Be well,
Zack

> I start with the original:
> 
>   cd original
>   git init URL
>   git addremote remote-source URL
>   git track remote-source
> 
> I make my own working directory:
> 
>   git fork my-changes ../my-changes
>   cd ../my-changes
> 
> Then I do my changes, and commit whenever I feel like I've gotten
> somewhere (or when I think I'm about to mess something up and might want
> to undo changes). Periodically, I check on the mainline:
> 
>   cd ../original
>   git pull
> 
> I also merge changes from the mainline:
> 
>   cd ../my-changes
>   git merge remote-source
> 
> When I'm done, I make a patch for my work:
> 
>   cd ../my-changes
>   git patch remote-source
> 
> I generally then fork the original again, split the patch, apply each
> section in the new fork, committing after each one, generate patches for
> each of these commits, and send those out. Then I discard my old branch
> and continue from the new one. If, at some point, all of the changes I
> want to keep have been put into the mainline, I discard all my branches
> and fork again from the mainline.
> 
> (My personal style is to discard the history of how the changes got made
> in favor of the history of how the changes got into the mainline, since I
> don't really need to keep all of my debugged mistakes that nobody else
> saw.)
> 
> 	-Daniel
> *This .sig left intentionally blank*
> 
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Zack Brown

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox