Git development
 help / color / mirror / Atom feed
* [PATCH] Print an error if cloning a http repo and NO_CURL is set
From: Fernando J. Pereda @ 2006-02-15 11:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

If Git is compiled with NO_CURL=YesPlease and one tries to
clone a http repository, git-clone tries to call the curl
binary. This trivial patch prints an error instead in such
situation.

Signed-off-by: Fernando J. Pereda <ferdy@gentoo.org>

---

 Makefile     |    1 +
 git-clone.sh |    8 +++++++-
 2 files changed, 8 insertions(+), 1 deletions(-)

896d96a92a13848ccce19c2f3dee9b5570ef02a7
diff --git a/Makefile b/Makefile
index d40aa6a..648469e 100644
--- a/Makefile
+++ b/Makefile
@@ -419,6 +419,7 @@ $(patsubst %.sh,%,$(SCRIPT_SH)) : % : %.
 	rm -f $@
 	sed -e '1s|#!.*/sh|#!$(call shq,$(SHELL_PATH))|' \
 	    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
+	    -e 's/@@NO_CURL@@/$(NO_CURL)/g' \
 	    $@.sh >$@
 	chmod +x $@
 
diff --git a/git-clone.sh b/git-clone.sh
index 47f3ec9..e192b08 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -206,7 +206,13 @@ yes,yes)
 		fi
 		;;
 	http://*)
-		clone_dumb_http "$repo" "$D"
+		if test -z "@@NO_CURL@@"
+		then
+			clone_dumb_http "$repo" "$D"
+		else
+			echo >&2 "http transport not supported, rebuild Git with curl support"
+			exit 1
+		fi
 		;;
 	*)
 		cd "$D" && case "$upload_pack" in
-- 
1.2.0

^ permalink raw reply related

* Re: Use case: GIT to manage transactions in a CMS?
From: Andreas Ericsson @ 2006-02-15 11:45 UTC (permalink / raw)
  To: "J. David Ibáñez"; +Cc: git
In-Reply-To: <43F30602.500@itaapy.com>

J. David Ibáñez wrote:
> Hello,
> 
> I am working on a project (a content management system) where the data
> is stored as files and folders.
> 
> Currently, for persistance and transactions we use the ZODB [1] object
> database. But our goal is to move away from the ZODB and use directly
> the file system, as it will allow us to use all the good unix tools.
> 
> We are using git to manage the source code. And now we are exploring git
> to see if it can do the job of transactions, so that each transaction in
> the system will be a git commit.
> 
> One problem we have found is that we can not commit empty directories (what
> we need to do). Any idea how to solve or work-around this constraint?
> 

$ touch empty/dir/.placeholder


> Any suggestions and input on this exotic use case for git will be very
> welcomed.
> 

Sounds cool. I'll have to give it a whirl when you've got something to show.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Shared repositories and umask
From: Martin Mares @ 2006-02-15 12:19 UTC (permalink / raw)
  To: git

Hello, world!\n

I'm playing with a shared repository and I am still unable to get the
file and directory permissions kept correctly, that is writeable to
a group.

Setting the `core.sharedrepository' flag helps a bit, but not completely:
the object files and directories are group-writeable, but for example new
head refs aren't.

The documentation hints on setting umask, but I would really like to avoid
doing that globally, because the user accounts are used for many other
things as well where the permissions should be tighter.

It seems that a correct solution would be to add an `umask' option to
the repository config and make enter_repo() adjust the umask accordingly.

I was thinking about doing the same in setup_git_directory() for the
local commands, but that probably doesn't make much sense since many commands
are in fact scripts creating files themselves.

If you agree, I will send a patch.

				Have a nice fortnight
-- 
Martin `MJ' Mares   <mj@ucw.cz>   http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Q: How many Prolog programmers does it take to change a light bulb?  A: No.

^ permalink raw reply

* Re: Shared repositories and umask
From: Petr Baudis @ 2006-02-15 13:05 UTC (permalink / raw)
  To: Martin Mares; +Cc: git
In-Reply-To: <mj+md-20060215.120104.14337.atrey@ucw.cz>

  Hi,

Dear diary, on Wed, Feb 15, 2006 at 01:19:07PM CET, I got a letter
where Martin Mares <mj@ucw.cz> said that...
> I'm playing with a shared repository and I am still unable to get the
> file and directory permissions kept correctly, that is writeable to
> a group.
> 
> Setting the `core.sharedrepository' flag helps a bit, but not completely:
> the object files and directories are group-writeable, but for example new
> head refs aren't.

  actually, this is not necessary, since when pushing to shared
repositories, the new ref is created in the directory as a lockfile and
then moved over the original ref - this makes the ref updating safe and
raceless, while also making it enough to have the refs directory
group-writable.

  Therefore, it shouldn't be actually necessary to meddle with umask
anymore. The documentation is obsolete; I'll remove the relevant bits
from Cogito docs.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply

* Re: Shared repositories and umask
From: Johannes Schindelin @ 2006-02-15 13:51 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Martin Mares, git
In-Reply-To: <20060215130538.GO31278@pasky.or.cz>

Hi,

On Wed, 15 Feb 2006, Petr Baudis wrote:

> Dear diary, on Wed, Feb 15, 2006 at 01:19:07PM CET, I got a letter
> where Martin Mares <mj@ucw.cz> said that...
> > I'm playing with a shared repository and I am still unable to get the
> > file and directory permissions kept correctly, that is writeable to
> > a group.
> > 
> > Setting the `core.sharedrepository' flag helps a bit, but not completely:
> > the object files and directories are group-writeable, but for example new
> > head refs aren't.
> 
>   actually, this is not necessary, since when pushing to shared
> repositories, the new ref is created in the directory as a lockfile and
> then moved over the original ref - this makes the ref updating safe and
> raceless, while also making it enough to have the refs directory
> group-writable.

IIRC the relevant discussion was started by this:

http://thread.gmane.org/gmane.comp.version-control.git/13856

>   Therefore, it shouldn't be actually necessary to meddle with umask
> anymore. The documentation is obsolete; I'll remove the relevant bits
> from Cogito docs.

Basically, if you just want a shared repository, you don't need to set the 
umask. However, if you want to work in the working directory (multiple 
users), you have to set the umask (it is not enough that the git tools do 
that, because you are likely to work with other programs as well).

Hth,
Dscho

^ permalink raw reply

* Re: Shared repositories and umask
From: Petr Baudis @ 2006-02-15 13:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Martin Mares, git
In-Reply-To: <Pine.LNX.4.63.0602151448590.10593@wbgn013.biozentrum.uni-wuerzburg.de>

  Hi,

Dear diary, on Wed, Feb 15, 2006 at 02:51:50PM CET, I got a letter
where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> On Wed, 15 Feb 2006, Petr Baudis wrote:
> >   Therefore, it shouldn't be actually necessary to meddle with umask
> > anymore. The documentation is obsolete; I'll remove the relevant bits
> > from Cogito docs.
> 
> Basically, if you just want a shared repository, you don't need to set the 
> umask. However, if you want to work in the working directory (multiple 
> users), you have to set the umask (it is not enough that the git tools do 
> that, because you are likely to work with other programs as well).

  yes, but that's kind of rare workflow - I guess mostly when you have
your website in GIT and update the working copy in the post-update hook
- but then you can easily setup umask in the hook as well.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply

* Re: Shared repositories and umask
From: Martin Mares @ 2006-02-15 14:06 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20060215130538.GO31278@pasky.or.cz>

>   Therefore, it shouldn't be actually necessary to meddle with umask
> anymore. The documentation is obsolete; I'll remove the relevant bits
> from Cogito docs.

Thanks for the hint!

				Have a nice fortnight
-- 
Martin `MJ' Mares   <mj@ucw.cz>   http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
First law of socio-genetics: Celibacy is not hereditary.

^ permalink raw reply

* Re: Handling large files with GIT
From: Linus Torvalds @ 2006-02-15 15:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Kuivinen, Git Mailing List
In-Reply-To: <7vd5hpj6ab.fsf@assigned-by-dhcp.cox.net>



On Wed, 15 Feb 2006, Junio C Hamano wrote:
> 
> I was thinking about implementing mergers as a pipeline:
> 
> 	git-merge-tree O A B |
>         git-merge-renaming A |
>         git-merge-aggressive A |
>         git-merge-filemerge

Great minds think alike.

> git-merge-tree (yours) does not do trivial collapsing, and
> produce raw-diff from A.

(It does _truly_ trivial collapsing, but I think we both agree: it doesn't 
do anything that we used to go git-merge-one-file on)

> git-merge-renaming reads it, finds
> copied/renamed entries (maybe reusing parts of diffcore), and
> writes out the results in the same format as merge-tree output

I was considering perhaps doing a first cut at that in git-merge-tree 
already. Not sure.

One issue is that I think I may have to change the output format if I do 
that. I should anyway. 

Why?

It's hard to see where "one event" stops, and another starts. I stupidly 
initially thought that you can do it entirely based on looking at the 
numbers, but you can't. Right now you have to look at the pathname too, 
which is kind of sad, and doesn't work after rename detection (since then 
the pathnames won't be sorted any more, and one "event" can have different 
pathnames in different stages).

[ Side note: it doesn't even work for file/directory conflicts, which can 
  have the same name, but are two different "events". So you'd actually 
  have to look at both mode _and_ filename to sort out if two lines that 
  start with "1" and "3" respectively are one event (removal in first 
  branch) or two events ("1" on one file: removal in both branches + "3" 
  on another file: add in second branch) ]

So to do the rename output, you can't use the same format as merge-tree 
uses right _now_. We'd have to add a marker to mark what the event 
boundaries are.

The "mark" could be a running "event number", or even as easy as an 
alternating character ("#" vs " " as the second character in the line or 
similar)

So instead of

	2 100644 ff280e2e1613e808e4d7844376134dfa2bb1fc21 Documentation/cputopology.txt
	2 100644 28c5b7d1eb90f0ccd8e0307c170f89bd7954dc9c Documentation/hwmon/f71805f
	1 100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	3 100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	2 100644 00a009b977e92b1a942d1138afdccf1b725df956 Documentation/i2c/busses/i2c-sis96x
	2 100644 90a5e9e5bef1daa9d0f0621e209827f0d180f384 Documentation/unshare.txt
	2 100644 5127f39fa9bf9a384a6529c6d5deb1002e945de5 arch/arm/mach-s3c2410/s3c2400-gpio.c
	2 100644 8b2394e1ed4088c3b8d38e87e58bde2f38152bf7 arch/arm/mach-s3c2410/s3c2400.h
	 ...

it migth be

	2#100644 ff280e2e1613e808e4d7844376134dfa2bb1fc21 Documentation/cputopology.txt
	2 100644 28c5b7d1eb90f0ccd8e0307c170f89bd7954dc9c Documentation/hwmon/f71805f
	1#100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	3#100644 b88953dfd58022aef1680c266c7438605b146fc8 Documentation/i2c/busses/i2c-sis69x
	2 100644 00a009b977e92b1a942d1138afdccf1b725df956 Documentation/i2c/busses/i2c-sis96x
	2#100644 90a5e9e5bef1daa9d0f0621e209827f0d180f384 Documentation/unshare.txt
	2 100644 5127f39fa9bf9a384a6529c6d5deb1002e945de5 arch/arm/mach-s3c2410/s3c2400-gpio.c
	2#100644 8b2394e1ed4088c3b8d38e87e58bde2f38152bf7 arch/arm/mach-s3c2410/s3c2400.h
	 ...

where you can clearly see the "grouping" without having to even look at 
the filename.

(The example I show actually has a rename-with-modifications that was made 
on the first branch: notice that i2c-sis69x vs i2c-sis96x thing?)

I don't know exactly what the "after rename detection" output format would 
be, but it _might_ turn that

	...
	1#b889... i2c-sis69x
	3#b889... i2c-sis69x
	2 00a0... i2c-sis96x
	...

into one event:

	...
	1#b889... i2c-sis69x
	2#00a0... i2c-sis96x
	3#b889... i2c-sis69x
	...

and then the actual file-merge logic would have to merge the names as well 
as the file contents (and in this case, the final name would thus be 
"i2c-sis96x", since one branch hadn't changed it).

Hmm?

		Linus

^ permalink raw reply

* Re: [FYI] pack idx format
From: Nicolas Pitre @ 2006-02-15 16:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vd5hpm2x0.fsf@assigned-by-dhcp.cox.net>

On Wed, 15 Feb 2006, Junio C Hamano wrote:

> This is still WIP but if anybody is interested...  Once done, it
> should become Documentation/technical/pack-format.txt.
> 
[...]
> 
> Pack file entry: <+
> 
>      packed object header:
> 	1-byte type (upper 4-bit)

Actually the type occupies only 3 bits (bits 4 to 6) as bit 7 is the 
size continuation bit.


Nicolas

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: Catalin Marinas @ 2006-02-15 17:12 UTC (permalink / raw)
  To: git
In-Reply-To: <20060214045618.GA12844@spearce.org>

On 14/02/06, Shawn Pearce <spearce@spearce.org> wrote:
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > - Automatic detection (and cancellation) of returning patches.
[...]
> > StGIT has been doing this from the beginning. You would need to run a
> > 'stg clean' after a rebase (or push). I prefer to run this command
> > manually so that 'stg series -e' would show the empty patches and let
> > me decided what to do with them.
>
> Actually StGIT didn't do this correctly for one of my use cases
> and that's one of the things that drove me to trying to write pg
> (because I wondered if there was a way to resolve it automatically).
> Try building a patch series such as:
[...]
> StGIT seemed to not handle this when it tried to reapply the two
> already applied patches.  A won't apply because the file coming
> down is actually A+B, not A's predecessor and not A.  B won't apply
> because the file also isn't A (B's predecessor).

You are right, if two patches modify the same line and both were
merged upstream, the three-way merging would report a conflict for the
first patch and maybe the second (depending on how the first conflict
was resolved).

> pg resolves this by attempting to automatically fold patches during
> a pg-rebase (equiv. of stg pull).  If a patch fails to push cleanly
> and there's another patch immediately behind it which also should
> be reapplied pg aborts and retries pushing the combination of the
> patches.  This fixes my A+B case quite nicely during a rebase.  :-)

But what would happen if there was a third-party patch that's
modifying the same line? A+B application would fail in this case. Does
pg go back to only apply A and report a conflict?

There is another problem with this approach if you have tens of
patches. Would pg try to fold all of them?

Some time ago I had a look at Darcs and its patch theory (patch
commuting). Their approach to conflicts was to include the conflicts
in patch A and propagate them to the last patch to be merged. It's
like creating two versions of the conflicting hunk, one of them
corresponding to the local tree (that in patch A) and the other to the
upstream tree. Merging patch B is only done in the local hunk in the
end both conflicting hunks would be identical and one of them removed.

While the above algrithm seems to work OK in Darcs (but quite resource
intensive), it's pretty hard to implement and I don't think it's worth
for a small number of cases this could occur.

--
Catalin

^ permalink raw reply

* Re: Handling large files with GIT
From: Linus Torvalds @ 2006-02-15 17:16 UTC (permalink / raw)
  To: Junio C Hamano, Ben Clifford; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0602150715470.3691@g5.osdl.org>



Btw, some actual numbers: I did the recent kernel networking merge (which 
is a trivial in-index merge) with the standard three-way

	git-read-tree -m <base> <branch> <branch>

and with the new git-merge-tree to compare performance.

Doing git-read-tree takes ~0.35s, while git-merge-tree took 0.015s.

Now, that's not a really fair comparison, because the end result is very 
different: the git-read-tree has populated the index, ready for a 
git-writet-ree, while the git-merge-tree has not. 

However, the interesting part is that especially for a trivial merge, we 
don't actually _want_ to necessarily populate the index, because doing a 
"git-write-tree" is actually a pretty expensive operation (on the kernel, 
it will try to write 1000+ directory trees, most of which already exist. 
Admittedly we don't actually have to write the objects, since we figure 
out that they already exist, but we have to do the SHA1 calculations to 
do so).

So if we made the git-merge-tree based merge work entirely on trees all 
the way, and never even necessarily populate the index at all (unless it 
has to, due to actual data conflicts that want to be fixed up), that would 
actually be another performance advantage. The only downside there is that 
we would literally have to write the resulting tree objects by hand (ie 
we'd need a new helper for doing that, and another thing to validate).

Anyway, that should almost certainly make it possible to scale up git 
merges to hundreds of thousands of files without huge performance problems 
(still, that depends a bit on layout - again, flat directory structures 
won't scale as well, so it might not be enough for maildir handling).

But just at a guess, I think there's at least an order of magnitude to be 
had there. So if a maildir merge currently takes an hour, at least we 
should be able to get it down to a few minutes.

Ben, are you interested in trying this out in your maildir experiments?

		Linus

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: Catalin Marinas @ 2006-02-15 17:20 UTC (permalink / raw)
  To: git
In-Reply-To: <20060214061406.GA13238@spearce.org>

On 14/02/06, Shawn Pearce <spearce@spearce.org> wrote:
> The diff-tree/apply approach is faster for a single commit then
> read-tree -u -m is; even if totally different files are being
> impacted and thus all stages collapse neatly to stage 0 in the index.
> No wonder StGIT uses diff/apply!

For the simple tests you did the difference is not that big. It
becomes a real problem when there are many file deletions/additions in
the upstream tree since git-read-tree doesn't handle them and
git-merge-index would need to call the external tool for each of them.

To test the above, clone the 2.6.12 kernel version, create some
trivial patches and rebase to 2.6.16-rc3. StGIT was running even for 5
minutes per patch before implementing the diff-tree/apply method.

--
Catalin

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: Catalin Marinas @ 2006-02-15 17:25 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Chuck Lever, Karl Hasselström, git
In-Reply-To: <20060214222913.GK31278@pasky.or.cz>

On 14/02/06, Petr Baudis <pasky@suse.cz> wrote:
> Dear diary, on Tue, Feb 14, 2006 at 09:58:02PM CET, I got a letter
> where Chuck Lever <cel@citi.umich.edu> said that...
> > my impression of git is that you don't change stuff that's already
> > committed.  you revert changes by applying a new commit that backs out
> > the original changes.  i'm speculating, but i suspect that's why there's
> > a "stg pick --reverse" and not a "stg uncommit."
>
> It is ok as long as you know what are you doing - if you don't push out
> the commits you've just "undid" (or work on a public accessible
> repository in the first place, but I think that's kind of rare these
> days; quick survey - does anyone reading these lines do that?), there's
> nothing wrong on it, and it gives you nice flexibility.
>
> For example, to import bunch of patches (I guess that's the original
> intention behind this) you just run git-am on them and then stg uncommit
> all of the newly added commits.

This is a sensible way of using an uncommit command but I initially
thought it would be better to make things harder for people wanting to
re-write the history. Anyway, I'll keep this command on my todo list.

--
Catalin

^ permalink raw reply

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: Shawn Pearce @ 2006-02-15 17:55 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git
In-Reply-To: <b0943d9e0602150912h55fb87d0r@mail.gmail.com>

Catalin Marinas <catalin.marinas@gmail.com> wrote:
> > pg resolves this by attempting to automatically fold patches during
> > a pg-rebase (equiv. of stg pull).  If a patch fails to push cleanly
> > and there's another patch immediately behind it which also should
> > be reapplied pg aborts and retries pushing the combination of the
> > patches.  This fixes my A+B case quite nicely during a rebase.  :-)
> 
> But what would happen if there was a third-party patch that's
> modifying the same line? A+B application would fail in this case. Does
> pg go back to only apply A and report a conflict?

When this occurs pg just gives up and leaves both patches A and
B unapplied and gives you the list of patches which it couldn't
apply but wanted to.  The working directory is left clean; its the
new base plus whatever patches before A that did apply cleanly.
I could have pg go back and try pushing A again and leave the
conflict ready for you to resolve but I don't always want that.
Since the user can have that happen with a quick no-arg `pg-push`
I leave it to the user to retry pushing A if they really think
that's worth trying.

However if the last patch fails to push during a pg-rebase then pg
leaves it alone and your working directory is dirty and you are left
with that last patch partially applied.  At which point you can back
it out by popping it off the stack or finish the conflict resolution.
 
> There is another problem with this approach if you have tens of
> patches. Would pg try to fold all of them?

Yea.  Which might not be pretty.  10 patches would cause pg to
attempt applying 11 patches before giving up, but each time the patch
is increased in size to include its predecessors who also didn't
apply cleanly.  As soon as a larger cluster applies pg goes back to
trying single patch application.  Obviously this could take a while
as the patch size is growing on each attempt and we are duplicating
work every time as pg always starts from a clean working directory.

Example: Say I have A, B, C, D, E, F on the stack.  A wasn't provided
by the upstream and pushes down cleanly.  B+C+D was given to me
by the upstream so pg first tries B, fails, then B+C, fails, then
B+C+D, succeeds, so it folds B+C+D into D and finishes pushing D.
Then it tries E, if E succeeds it tries F on its own.  If E fails it
tries E+F.  What's left in the working directory depends on if the
last operation was an auto-fold attempt or not and if it applied
cleanly (or not).

> Some time ago I had a look at Darcs and its patch theory (patch
> commuting). Their approach to conflicts was to include the conflicts
> in patch A and propagate them to the last patch to be merged. It's
> like creating two versions of the conflicting hunk, one of them
> corresponding to the local tree (that in patch A) and the other to the
> upstream tree. Merging patch B is only done in the local hunk in the
> end both conflicting hunks would be identical and one of them removed.
> 
> While the above algrithm seems to work OK in Darcs (but quite resource
> intensive), it's pretty hard to implement and I don't think it's worth
> for a small number of cases this could occur.

Hmm.  I had looked at Darcs over a year ago and found it to be a
rather interesting idea but at the time it couldn't handle my ~7000
file tree (and GIT wasn't even getting started yet).  I was actually
thinking about trying to drag the rejecting hunks forward somehow
when doing the auto-folding but I hadn't quite found a way to do
that easily.  I have a gut feeling that most of the time when this
problem occurs its on a subset of the files involved in any given
patch and that if I can push down a patch cleanly for 90+% of the
files while delaying the conflicts forward that might actually be
somewhat reasonable.  But maybe not.  :-)

-- 
Shawn.

^ permalink raw reply

* [PATCH] More useful/hinting error messages in git-checkout
From: Josef Weidendorfer @ 2006-02-15 19:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v3bilr2zr.fsf@assigned-by-dhcp.cox.net>


Signed-off-by: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
---

On Tuesday 14 February 2006 23:26, you wrote:
>
> 	$ git checkout -b test v2.6.10
> 
> The user wanted to create a new branch test based on tag
> v2.6.10, alas that tag does not exist.  We give quite confusing
> error message because we are confused that the user meant to
> checkout only "./v2.6.10" file and that operation and switching
> branches are incompatible.

Does this patch clarify the error condition?

Josef


 git-checkout.sh |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

d15e024c0bd07a2f0dad6e2729e2681df374c8e6
diff --git a/git-checkout.sh b/git-checkout.sh
index 6a87c71..b7d892d 100755
--- a/git-checkout.sh
+++ b/git-checkout.sh
@@ -22,7 +22,7 @@ while [ "$#" != "0" ]; do
 		[ -e "$GIT_DIR/refs/heads/$newbranch" ] &&
 			die "git checkout: branch $newbranch already exists"
 		git-check-ref-format "heads/$newbranch" ||
-			die "we do not like '$newbranch' as a branch name."
+			die "git checkout: we do not like '$newbranch' as a branch name."
 		;;
 	"-f")
 		force=1
@@ -75,9 +75,15 @@ done
 
 if test "$#" -ge 1
 then
+	hint=
+	if test "$#" -eq 1
+	then
+		hint="
+Did you intend to checkout '$@' which can not be resolved as commit?"
+	fi
 	if test '' != "$newbranch$force$merge"
 	then
-		die "updating paths and switching branches or forcing are incompatible."
+		die "git checkout: updating paths is incompatible with switching branches/forcing$hint"
 	fi
 	if test '' != "$new"
 	then
@@ -117,7 +123,8 @@ fi
 
 [ -z "$branch$newbranch" ] &&
 	[ "$new" != "$old" ] &&
-	die "git checkout: you need to specify a new branch name"
+	die "git checkout: to checkout the requested commit you need to specify 
+              a name for a new branch which is created and switched to"
 
 if [ "$force" ]
 then
-- 
1.2.0.g719b

^ permalink raw reply related

* Re: [ANNOUNCE] pg - A patch porcelain for GIT
From: J. Bruce Fields @ 2006-02-15 19:45 UTC (permalink / raw)
  To: Sam Vilain, Petr Baudis, Chuck Lever, Karl Hasselström,
	Catalin Marinas, git
In-Reply-To: <20060215065411.GB26632@spearce.org>

On Wed, Feb 15, 2006 at 01:54:11AM -0500, Shawn Pearce wrote:
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > On Tue, Feb 14, 2006 at 07:35:10PM -0500, Shawn Pearce wrote:
> > > Publishing a repository with a stg (or pg) patch series isn't
> > > a problem; the problem is that no clients currently know how to
> > > follow along with the remote repository's patch series.  And I can't
> > > think of a sensible behavior for doing so that isn't what git-core is
> > > already doing today for non patch series type clients (as in don't go
> > > backwards by popping but instead by pushing a negative delta).  :-)
> > 
> > If you represent each patch as a branch, with each modification to the
> > patch a commit on the corresponding branch, and each "push" operation a
> > merge from the branch corresponding to the previous patch to a branch
> > corresponding to the new patch (isn't that what pg's trying to do?),
> > then it should be possible just to track the branch corresponding to the
> > top patch.
> 
> Yes that's pg in a nutshell.
> 
> But what happens when I pop back two patches (of three) and then push
> down a different (fourth) patch?  The tree just rewound backwards
> and then forwards again in a different direction.

So you've got p1, p2, and p3 applied, each with its corresponding
branch--respectively, b1, b2, and b3.  Popping two patches just checks
out b1, and doesn't affect the repository at all.  If you push a new
patch, p4, you've just created a new branch, b4--you haven't touched the
existing branches.  If you push p2 and p3 back on, you're just merging
the new changes from b4 into b2 and then merging the newly merged b2
into b3.

>From the point of view of someone tracking b3, this is all fine.  OK,
maybe it's excessively complicated, but pulls should work, because it
never sees history diseappear as it does when you represent each patch
with a commit on a single branch.

> > If you really want revision control on patches the simplest thing might
> > be just to run quilt or Andrew Morton's scripts on top of a git
> > repository--the documentation with Andrew's scripts recommends doing
> > that with CVS.
> 
> True but you also then run into problems about needing to know which
> base each patch revision was applied against so you can reproduce
> a source tree plus patch at a specific point in time.

Right, so you keep the tree under revision control as well as the
patches.

--b.

^ permalink raw reply

* Modified files coming from v2.6.12 checkout
From: Shawn Pearce @ 2006-02-15 20:04 UTC (permalink / raw)
  To: git

I'm trying to do a performance test suggested by Catalin. I cloned
(what I thought to be) Linus public kernel tree[1] then locally
cloned it again and tried to checkout a working directory of v2.6.12:

  $ git-clone [1]
  $ git-clone -l -n linux-2.6 bigmergetest
  $ cd bigmergetest
  $ ls -a
  .       ..      .git
  $ git-update-ref HEAD $(git-rev-parse --verify v2.6.12^{commit})
  $ git-read-tree HEAD
  $ git-checkout-index -u -a
  git-checkout-index: include/linux/netfilter_ipv4/ipt_connmark.h already exists
  ...
  git-checkout-index: net/ipv4/netfilter/ipt_tos.c already exists
  git-checkout-index: net/ipv6/netfilter/ip6t_mark.c already exists

That can't be right.  Why do so many files already exist during
an empty checkout? git-status is reporting these files as being
modified.  If I commit these 'modified' files there's actually a
rather large delta if I diff v2.6.12 and the new commit.

I've tried this both with git 1.1.4 and 1.2.0.  Same result.
I've also tried it with both the v2.6.12 tag and the current HEAD.
Same result just different files having the problem.

I just looked at the linux-2.6 directory which I cloned from [1];
it appears to have the same problem but on a slightly different
set of files than the v.2.6.12 clone:

  $ git-status | grep modified | wc -l
       18

Thoughts?  Suggestions of where to start looking for a fault?
Does the fault exist between the chair and the keyboard?


[1] http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/

-- 
Shawn.

^ permalink raw reply

* Re: Modified files coming from v2.6.12 checkout
From: Shawn Pearce @ 2006-02-15 20:21 UTC (permalink / raw)
  To: git
In-Reply-To: <20060215200442.GB5742@spearce.org>

Gaaah.  This is was on MacOS X.  Whose filesystem isn't case
sensitive yet somehow tries to be.  If ipt_TOS.c and ipt_tos.c
both exist in the same directory I'm not surprised this works ok
for everyone else except me.  :-)

Fault found: A problem exists between the chair and the keyboard.
I'll switch to a Linux system to work with Linux kernel sources.

Sorry for the noise.


Shawn Pearce <spearce@spearce.org> wrote:
> I'm trying to do a performance test suggested by Catalin. I cloned
> (what I thought to be) Linus public kernel tree[1] then locally
> cloned it again and tried to checkout a working directory of v2.6.12:
> 
>   $ git-clone [1]
>   $ git-clone -l -n linux-2.6 bigmergetest
>   $ cd bigmergetest
>   $ ls -a
>   .       ..      .git
>   $ git-update-ref HEAD $(git-rev-parse --verify v2.6.12^{commit})
>   $ git-read-tree HEAD
>   $ git-checkout-index -u -a
>   git-checkout-index: include/linux/netfilter_ipv4/ipt_connmark.h already exists
>   ...
>   git-checkout-index: net/ipv4/netfilter/ipt_tos.c already exists
>   git-checkout-index: net/ipv6/netfilter/ip6t_mark.c already exists
> 
> That can't be right.  Why do so many files already exist during
> an empty checkout? git-status is reporting these files as being
> modified.  If I commit these 'modified' files there's actually a
> rather large delta if I diff v2.6.12 and the new commit.
> 
> I've tried this both with git 1.1.4 and 1.2.0.  Same result.
> I've also tried it with both the v2.6.12 tag and the current HEAD.
> Same result just different files having the problem.
> 
> I just looked at the linux-2.6 directory which I cloned from [1];
> it appears to have the same problem but on a slightly different
> set of files than the v.2.6.12 clone:
> 
>   $ git-status | grep modified | wc -l
>        18
> 
> Thoughts?  Suggestions of where to start looking for a fault?
> Does the fault exist between the chair and the keyboard?
> 
> 
> [1] http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/

-- 
Shawn.

^ permalink raw reply

* git faq : draft and rfc
From: Thomas Riboulet @ 2006-02-16  0:36 UTC (permalink / raw)
  To: git

hi,

After a quick talk on irc with pasky, fonseca and Tv, I've started to
write a git faq starting with questions I had and they have suggested.

I've setted up a git repos : http://git.librium.org/git_faq.git (atm
it's on my DSL link, so it could be slow, should move soon)

The faq is available in both docbook xml and text format (don't know
what you prefer).
You can see the html output there : http://koalabs.org/~ange/git_faq/faq.html
and the text file : http://koalabs.org/~ange/git_faq/faq.txt
if needed I can write specific xsl stylesheets to handle better or
more suited output.

Comments and suggestions are welcome (on the content, the form, format, etc ...)
I'll try to add questions from the archives of this ml, I'm also open
to any suggestions.

Here is a first (text) version :
----

. Why the 'git' name ?
As Linus' own words as the inventor of git :
"git" can mean anything, depending on your mood.
  - random three-letter combination that is pronounceable, and not
    actually used by any common UNIX command.  The fact that it is a
    mispronunciation of "get" may or may not be relevant.
  - stupid. contemptible and despicable. simple. Take your pick from the
    dictionary of slang.
  - "global information tracker": you're in a good mood, and it actually
    works for you. Angels sing, and a light suddenly fills the room.
  - "goddamn idiotic truckload of sh*t": when it breaks

. Can I use my git public repository in a shared way ?
Yes. Use cg-admin-setuprepo -g or do git-init-db --shared and some
additional stuff. It's ok that refs aren't group writable, it's
enough the directory is. See Cogito README or GIT's cvs-migration doc,
"Emulating the CVS Development Model" for details.

. Git commit is dying telling me "fatal : empty ident <user@myhost>
not allowed", what's wrong ?
Make sure your Full Name is not empty in chsh or the 5th field of your
user line in /etc/passwd isn't empty. If you @myhost is empty make sure
your hostname is correctly set.

. What's the difference between fetch and pull ?
Fetch : download objects and a head from another repository.
Pull : pull and merge from another repository.
See man git-fetch and git-pull for more.

. Can I tell git to ignore files ?
Yes. Put the files path in the repository in the .git/info/exclude file.

. Can I import from cvs ?
Yes. Use git-cvsimport. See the cvs-migration doc for more details.

. Can I import from svn ?
Yes. Use git-svnimport. See the svn-import doc for more details.

. What can I use to setup a public repository ?
A ssh server, an http server, or the git-daemon.
See the tutorial for more details.


--
Thom/ange

^ permalink raw reply

* Re: "git reset --hard" leaves empty directories that shouldn't exist
From: Linus Torvalds @ 2006-02-16  1:35 UTC (permalink / raw)
  To: Carl Worth; +Cc: git
In-Reply-To: <87y80dhxfd.wl%cworth@cworth.org>



On Tue, 14 Feb 2006, Carl Worth wrote:
>
> I've been exploring the potential for git-sync, and I found some odd
> behavior with "git reset --hard". It appears that if the current tree
> has some directory structure (at least two levels deep) that does not
> exist in the tree being reset to, that empty directories are left
> around after the reset:

"git reset --hard xyz" in many ways is a sledgehammer, and it says "I want 
the state at the point of xyz, and I don't care _what_ the heck the 
current state is".

Now, that's somewhat problematic, exactly because of that "screw the 
current state" thing. It actually tries to remove files (see the 
"tmp-exists" thing in the git-reset script), but it's being pretty stupid 
about it. It also very definitely will not try to remove subdirectories, 
empty or not.

(I say that without being able to read perl, so I might be wrong. Maybe it 
tries and just fails).

Anyway, if you want to do the "gentle and smart" thing, you should 
probably actually use

	git-read-tree -m -u <oldtree> <newtree>

which unlike "git-reset" will gently _update_ the tree from one version to 
another (and will error out if your checked-out copy doesn't match the 
old tree).

And the gentle way will actually do the right thing wrt subdirectories 
(note that it will _not_ remove empty subdirectories if you have left 
files - like object files - around that it doesn't know about: that's not 
an error, but the unknown file will not, nor the subdirectory, be 
removed).

And yes, git-reset should probably do the subdirectory thing too. In the 
meantime you should think of it as the brute-force and not very smart way 
(in Calvin and Hobbes terms, "git reset" is Moe).

			Linus

^ permalink raw reply

* [PATCH] pack-objects: reuse data from existing pack.
From: Junio C Hamano @ 2006-02-16  1:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7vd5hpm2x0.fsf@assigned-by-dhcp.cox.net>

When generating a new pack, notice if we have already the wanted
object in existing packs.  If the object has a delitified
representation, and its base object is also what we are going to
pack, then reuse the existing deltified representation
unconditionally, bypassing all the expensive find_deltas() and
try_deltas() routines.

Also, when writing out such deltified representation and
undeltified representation, if a matching data already exists in
an existing pack, just write it out without uncompressing &
recompressing.

Without this patch:

    $ git-rev-list --objects v1.0.0 >RL
    $ time git-pack-objects p <RL

    Generating pack...
    Done counting 12233 objects.
    Packing 12233 objects....................
    60a88b3979df41e22d1edc3967095e897f720192

    real    0m32.751s
    user    0m27.090s
    sys     0m2.750s

With this patch:

    $ git-rev-list --objects v1.0.0 >RL
    $ time ../git.junio/git-pack-objects q <RL

    Generating pack...
    Done counting 12233 objects.
    Packing 12233 objects.....................
    60a88b3979df41e22d1edc3967095e897f720192
    Total 12233, written 12233, reused 12177

    real    0m4.007s
    user    0m3.360s
    sys     0m0.090s

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * This may depend on one cleanup patch I have not sent out, but
   I am so excited that I could not help sending this out first.

   Admittedly this is hot off the press, I have not had enough
   time to beat this too hard, but the resulting pack from the
   above passed unpack-objects, index-pack and verify-pack.

 pack-objects.c |  317 ++++++++++++++++++++++++++++++++++++++++++++++----------
 pack.h         |    4 +
 sha1_file.c    |   19 +++
 3 files changed, 283 insertions(+), 57 deletions(-)

0d574a3c3ec6924118d06ee0487d02d2fbb12646
diff --git a/pack-objects.c b/pack-objects.c
index c5a5e61..f2a45a2 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -9,13 +9,16 @@ static const char pack_usage[] = "git-pa
 
 struct object_entry {
 	unsigned char sha1[20];
-	unsigned long size;
-	unsigned long offset;
-	unsigned int depth;
-	unsigned int hash;
+	unsigned long size;	/* uncompressed size */
+	unsigned long offset;	/* offset into the final pack file (nonzero if already written) */
+	unsigned int depth;	/* delta depth */
+	unsigned int hash;	/* name hint hash */
 	enum object_type type;
-	unsigned long delta_size;
-	struct object_entry *delta;
+	unsigned long delta_size;	/* delta data size (uncompressed) */
+	struct object_entry *delta;	/* delta base object */
+	struct packed_git *in_pack; 	/* already in pack */
+	enum object_type in_pack_type;	/* could be delta */
+	unsigned int in_pack_offset;
 };
 
 static unsigned char object_list_sha1[20];
@@ -29,6 +32,105 @@ static const char *base_name;
 static unsigned char pack_file_sha1[20];
 static int progress = 1;
 
+static int *object_ix = NULL;
+static int object_ix_hashsz = 0;
+
+struct pack_revindex {
+	struct packed_git *p;
+	unsigned long *revindex;
+} *pack_revindex = NULL;
+static int pack_revindex_hashsz = 0;
+
+static int pack_revindex_ix(struct packed_git *p)
+{
+	unsigned int ui = (unsigned int) p;
+	int i;
+	ui = ui ^ (ui >> 16);
+	i = (int)(ui % pack_revindex_hashsz);
+	while (pack_revindex[i].p) {
+		if (pack_revindex[i].p == p)
+			return i;
+		if (++i == pack_revindex_hashsz)
+			i = 0;
+	}
+	return -1 - i;
+}
+
+static void prepare_pack_ix(void)
+{
+	int num;
+	struct packed_git *p;
+	for (num = 0, p = packed_git; p; p = p->next)
+		num++;
+	if (!num)
+		return;
+	pack_revindex_hashsz = num * 11;
+	pack_revindex = xcalloc(sizeof(*pack_revindex), pack_revindex_hashsz);
+	for (p = packed_git; p; p = p->next) {
+		num = pack_revindex_ix(p);
+		num = - 1 - num;
+		pack_revindex[num].p = p;
+	}
+	/* revindex elements are lazily initialized */
+}
+
+static int cmp_offset(const void *a_, const void *b_)
+{
+	unsigned long a = *(unsigned long *) a_;
+	unsigned long b = *(unsigned long *) b_;
+	if (a < b)
+		return -1;
+	else if (a == b)
+		return 0;
+	else
+		return 1;
+}
+
+static void prepare_pack_revindex(struct pack_revindex *rix)
+{
+	struct packed_git *p = rix->p;
+	int num_ent = num_packed_objects(p);
+	int i;
+	void *index = p->index_base + 256;
+
+	rix->revindex = xmalloc(sizeof(unsigned long) * (num_ent + 1));
+	for (i = 0; i < num_ent; i++) {
+		long hl = *((long *)(index + 24 * i));
+		rix->revindex[i] = ntohl(hl);
+	}
+	rix->revindex[num_ent] = p->pack_size - 20;
+	qsort(rix->revindex, num_ent, sizeof(unsigned long), cmp_offset);
+}
+
+static unsigned long find_packed_object_size(struct packed_git *p,
+					     unsigned long ofs)
+{
+	int num;
+	int lo, hi;
+	struct pack_revindex *rix;
+	unsigned long *revindex;
+	num = pack_revindex_ix(p);
+	if (num < 0)
+		die("internal error: pack revindex uninitialized");
+	rix = &pack_revindex[num];
+	if (!rix->revindex)
+		prepare_pack_revindex(rix);
+	revindex = rix->revindex;
+	lo = 0;
+	hi = num_packed_objects(p) + 1;
+	do {
+		int mi = (lo + hi) / 2;
+		if (revindex[mi] == ofs) {
+			return revindex[mi+1] - ofs;
+		}
+		else if (ofs < revindex[mi])
+			hi = mi;
+		else
+			lo = mi + 1;
+	} while (lo < hi);
+	die("internal error: pack revindex corrupt");
+}
+
 static void *delta_against(void *buf, unsigned long size, struct object_entry *entry)
 {
 	unsigned long othersize, delta_size;
@@ -74,39 +176,59 @@ static int encode_header(enum object_typ
 	return n;
 }
 
+static int written = 0;
+static int reused = 0;
+
 static unsigned long write_object(struct sha1file *f, struct object_entry *entry)
 {
 	unsigned long size;
 	char type[10];
-	void *buf = read_sha1_file(entry->sha1, type, &size);
+	void *buf;
 	unsigned char header[10];
 	unsigned hdrlen, datalen;
 	enum object_type obj_type;
 
-	if (!buf)
-		die("unable to read %s", sha1_to_hex(entry->sha1));
-	if (size != entry->size)
-		die("object %s size inconsistency (%lu vs %lu)", sha1_to_hex(entry->sha1), size, entry->size);
-
-	/*
-	 * The object header is a byte of 'type' followed by zero or
-	 * more bytes of length.  For deltas, the 20 bytes of delta sha1
-	 * follows that.
-	 */
 	obj_type = entry->type;
-	if (entry->delta) {
-		buf = delta_against(buf, size, entry);
-		size = entry->delta_size;
-		obj_type = OBJ_DELTA;
-	}
-	hdrlen = encode_header(obj_type, size, header);
-	sha1write(f, header, hdrlen);
-	if (entry->delta) {
-		sha1write(f, entry->delta, 20);
-		hdrlen += 20;
+	if (!entry->in_pack ||
+	    (obj_type != entry->in_pack_type)) {
+		buf = read_sha1_file(entry->sha1, type, &size);
+		if (!buf)
+			die("unable to read %s", sha1_to_hex(entry->sha1));
+		if (size != entry->size)
+			die("object %s size inconsistency (%lu vs %lu)",
+			    sha1_to_hex(entry->sha1), size, entry->size);
+		if (entry->delta) {
+			buf = delta_against(buf, size, entry);
+			size = entry->delta_size;
+			obj_type = OBJ_DELTA;
+		}
+		/*
+		 * The object header is a byte of 'type' followed by zero or
+		 * more bytes of length.  For deltas, the 20 bytes of delta
+		 * sha1 follows that.
+		 */
+		hdrlen = encode_header(obj_type, size, header);
+		sha1write(f, header, hdrlen);
+
+		if (entry->delta) {
+			sha1write(f, entry->delta, 20);
+			hdrlen += 20;
+		}
+		datalen = sha1write_compressed(f, buf, size);
+		free(buf);
+	}
+	else {
+		struct packed_git *p = entry->in_pack;
+		use_packed_git(p);
+
+		datalen = find_packed_object_size(p, entry->in_pack_offset);
+		buf = p->pack_base + entry->in_pack_offset;
+		sha1write(f, buf, datalen);
+		unuse_packed_git(p);
+		hdrlen = 0; /* not really */
+		reused++;
 	}
-	datalen = sha1write_compressed(f, buf, size);
-	free(buf);
+	written++;
 	return hdrlen + datalen;
 }
 
@@ -196,18 +318,21 @@ static int add_object_entry(unsigned cha
 {
 	unsigned int idx = nr_objects;
 	struct object_entry *entry;
-
-	if (incremental || local) {
-		struct packed_git *p;
-
-		for (p = packed_git; p; p = p->next) {
-			struct pack_entry e;
-
-			if (find_pack_entry_one(sha1, &e, p)) {
-				if (incremental)
-					return 0;
-				if (local && !p->pack_local)
-					return 0;
+	struct packed_git *p;
+	unsigned int found_offset;
+	struct packed_git *found_pack;
+
+	found_pack = NULL;
+	for (p = packed_git; p; p = p->next) {
+		struct pack_entry e;
+		if (find_pack_entry_one(sha1, &e, p)) {
+			if (incremental)
+				return 0;
+			if (local && !p->pack_local)
+				return 0;
+			if (!found_pack) {
+				found_offset = e.offset;
+				found_pack = e.p;
 			}
 		}
 	}
@@ -221,30 +346,99 @@ static int add_object_entry(unsigned cha
 	memset(entry, 0, sizeof(*entry));
 	memcpy(entry->sha1, sha1, 20);
 	entry->hash = hash;
+	if (found_pack) {
+		entry->in_pack = found_pack;
+		entry->in_pack_offset = found_offset;
+	}
 	nr_objects = idx+1;
 	return 1;
 }
 
+static int locate_object_entry_hash(unsigned char *sha1)
+{
+	int i;
+	unsigned int ui;
+	memcpy(&ui, sha1, sizeof(unsigned int));
+	i = ui % object_ix_hashsz;
+	while (0 < object_ix[i]) {
+		if (!memcmp(sha1, objects[object_ix[i]-1].sha1, 20))
+			return i;
+		if (++i == object_ix_hashsz)
+			i = 0;
+	}
+	return -1 - i;
+}
+
+static struct object_entry *locate_object_entry(unsigned char *sha1)
+{
+	int i = locate_object_entry_hash(sha1);
+	if (0 <= i)
+		return &objects[object_ix[i]-1];
+	return NULL;
+}
+
 static void check_object(struct object_entry *entry)
 {
 	char type[20];
 
-	if (!sha1_object_info(entry->sha1, type, &entry->size)) {
-		if (!strcmp(type, "commit")) {
-			entry->type = OBJ_COMMIT;
-		} else if (!strcmp(type, "tree")) {
-			entry->type = OBJ_TREE;
-		} else if (!strcmp(type, "blob")) {
-			entry->type = OBJ_BLOB;
-		} else if (!strcmp(type, "tag")) {
-			entry->type = OBJ_TAG;
-		} else
-			die("unable to pack object %s of type %s",
-			    sha1_to_hex(entry->sha1), type);
+	if (entry->in_pack) {
+		/* Check if it is delta, and the base is also an object
+		 * we are going to pack.  If so we will reuse the existing
+		 * delta.
+		 */
+		unsigned char base[20];
+		unsigned long size;
+		struct object_entry *base_entry;
+		if (!check_reuse_pack_delta(entry->in_pack,
+					    entry->in_pack_offset,
+					    base, &size,
+					    &entry->in_pack_type) &&
+		    (base_entry = locate_object_entry(base))) {
+			/* we do not know yet, but it does not matter */
+			entry->depth = 1;
+			/* uncompressed size */
+			entry->size = entry->delta_size = size;
+			entry->delta = base_entry;
+			entry->type = OBJ_DELTA; /* !! */
+			return;
+		}
+		/* Otherwise we would do the usual */
 	}
-	else
+
+	if (sha1_object_info(entry->sha1, type, &entry->size))
 		die("unable to get type of object %s",
 		    sha1_to_hex(entry->sha1));
+
+	if (!strcmp(type, "commit")) {
+		entry->type = OBJ_COMMIT;
+	} else if (!strcmp(type, "tree")) {
+		entry->type = OBJ_TREE;
+	} else if (!strcmp(type, "blob")) {
+		entry->type = OBJ_BLOB;
+	} else if (!strcmp(type, "tag")) {
+		entry->type = OBJ_TAG;
+	} else
+		die("unable to pack object %s of type %s",
+		    sha1_to_hex(entry->sha1), type);
+}
+
+static void hash_objects(void)
+{
+	int i;
+	struct object_entry *oe;
+
+	object_ix_hashsz = nr_objects * 2;
+	object_ix = xcalloc(sizeof(int), object_ix_hashsz);
+	for (i = 0, oe = objects; i < nr_objects; i++, oe++) {
+		int ix = locate_object_entry_hash(oe->sha1);
+		if (0 <= ix) {
+			error("the same object '%s' added twice",
+			      sha1_to_hex(oe->sha1));
+			continue;
+		}
+		ix = -1 - ix;
+		object_ix[ix] = i + 1;
+	}
 }
 
 static void get_object_details(void)
@@ -252,6 +446,8 @@ static void get_object_details(void)
 	int i;
 	struct object_entry *entry = objects;
 
+	hash_objects();
+	prepare_pack_ix();
 	for (i = 0; i < nr_objects; i++)
 		check_object(entry++);
 }
@@ -382,6 +578,11 @@ static void find_deltas(struct object_en
 			eye_candy -= nr_objects / 20;
 			fputc('.', stderr);
 		}
+
+		if (entry->delta)
+			/* we already know what we need to know */
+			continue;
+
 		free(n->data);
 		n->entry = entry;
 		n->data = read_sha1_file(entry->sha1, type, &size);
@@ -411,10 +612,12 @@ static void find_deltas(struct object_en
 
 static void prepare_pack(int window, int depth)
 {
-	get_object_details();
-
 	if (progress)
 		fprintf(stderr, "Packing %d objects", nr_objects);
+	get_object_details();
+	if (progress)
+		fprintf(stderr, ".");
+
 	sorted_by_type = create_sorted_list(type_size_sort);
 	if (window && depth)
 		find_deltas(sorted_by_type, window+1, depth);
@@ -599,5 +802,7 @@ int main(int argc, char **argv)
 			puts(sha1_to_hex(object_list_sha1));
 		}
 	}
+	fprintf(stderr, "Total %d, written %d, reused %d\n",
+		nr_objects, written, reused);
 	return 0;
 }
diff --git a/pack.h b/pack.h
index 9dafa2b..694e0c5 100644
--- a/pack.h
+++ b/pack.h
@@ -29,5 +29,7 @@ struct pack_header {
 };
 
 extern int verify_pack(struct packed_git *, int);
-
+extern int check_reuse_pack_delta(struct packed_git *, unsigned long,
+				  unsigned char *, unsigned long *,
+				  enum object_type *);
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 64cf245..0a3a721 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -826,6 +826,25 @@ static unsigned long unpack_object_heade
 	return offset;
 }
 
+int check_reuse_pack_delta(struct packed_git *p, unsigned long offset,
+			   unsigned char *base, unsigned long *sizep,
+			   enum object_type *kindp)
+{
+	unsigned long ptr;
+	int status = -1;
+
+	use_packed_git(p);
+	ptr = offset;
+	ptr = unpack_object_header(p, ptr, kindp, sizep);
+	if (*kindp != OBJ_DELTA)
+		goto done;
+	memcpy(base, p->pack_base + ptr, 20);
+	status = 0;
+ done:
+	unuse_packed_git(p);
+	return status;
+}
+
 void packed_object_info_detail(struct pack_entry *e,
 			       char *type,
 			       unsigned long *size,
-- 
1.2.0.gcfba7

^ permalink raw reply related

* [PATCH] packed objects: minor cleanup
From: Junio C Hamano @ 2006-02-16  1:45 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds
In-Reply-To: <7vbqx8m62q.fsf@assigned-by-dhcp.cox.net>

The delta depth is unsigned.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * And this is the clean-up patch that comes before the "packed
   representation reuse" patch.

 cache.h      |    2 +-
 pack-check.c |    4 ++--
 sha1_file.c  |    4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

f8f135c9bae74f846a92e1f1f1fea8308802ace5
diff --git a/cache.h b/cache.h
index c255421..b5db01f 100644
--- a/cache.h
+++ b/cache.h
@@ -322,7 +322,7 @@ extern int num_packed_objects(const stru
 extern int nth_packed_object_sha1(const struct packed_git *, int, unsigned char*);
 extern int find_pack_entry_one(const unsigned char *, struct pack_entry *, struct packed_git *);
 extern void *unpack_entry_gently(struct pack_entry *, char *, unsigned long *);
-extern void packed_object_info_detail(struct pack_entry *, char *, unsigned long *, unsigned long *, int *, unsigned char *);
+extern void packed_object_info_detail(struct pack_entry *, char *, unsigned long *, unsigned long *, unsigned int *, unsigned char *);
 
 /* Dumb servers support */
 extern int update_server_info(int);
diff --git a/pack-check.c b/pack-check.c
index 67a7ecd..eca32b6 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -84,7 +84,7 @@ static void show_pack_info(struct packed
 		char type[20];
 		unsigned long size;
 		unsigned long store_size;
-		int delta_chain_length;
+		unsigned int delta_chain_length;
 
 		if (nth_packed_object_sha1(p, i, sha1))
 			die("internal error pack-check nth-packed-object");
@@ -98,7 +98,7 @@ static void show_pack_info(struct packed
 		if (!delta_chain_length)
 			printf("%-6s %lu %u\n", type, size, e.offset);
 		else
-			printf("%-6s %lu %u %d %s\n", type, size, e.offset,
+			printf("%-6s %lu %u %u %s\n", type, size, e.offset,
 			       delta_chain_length, sha1_to_hex(base_sha1));
 	}
 
diff --git a/sha1_file.c b/sha1_file.c
index 3d11a9b..64cf245 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -830,7 +830,7 @@ void packed_object_info_detail(struct pa
 			       char *type,
 			       unsigned long *size,
 			       unsigned long *store_size,
-			       int *delta_chain_length,
+			       unsigned int *delta_chain_length,
 			       unsigned char *base_sha1)
 {
 	struct packed_git *p = e->p;
@@ -844,7 +844,7 @@ void packed_object_info_detail(struct pa
 	if (kind != OBJ_DELTA)
 		*delta_chain_length = 0;
 	else {
-		int chain_length = 0;
+		unsigned int chain_length = 0;
 		memcpy(base_sha1, pack, 20);
 		do {
 			struct pack_entry base_ent;
-- 
1.2.0.gcfba7

^ permalink raw reply related

* Re: [FYI] pack idx format
From: Junio C Hamano @ 2006-02-16  1:58 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0602151144010.5606@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> On Wed, 15 Feb 2006, Junio C Hamano wrote:
>
>> This is still WIP but if anybody is interested...  Once done, it
>> should become Documentation/technical/pack-format.txt.
>> 
> [...]
>> 
>> Pack file entry: <+
>> 
>>      packed object header:
>> 	1-byte type (upper 4-bit)
>
> Actually the type occupies only 3 bits (bits 4 to 6) as bit 7 is the 
> size continuation bit.

You are right.

^ permalink raw reply

* Genealogical branches
From: Ron Parker @ 2006-02-16  2:20 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3710 bytes --]

I am working on a stepped tutorial that walks the user through the
development of a number of related files. I have been trying to keep
this under various SCMs. Currently git seems like the best match. But
I have some questions.

The issue is that I want to have a directory which will ultimately
contain selected portions of the final products genealogy in addition
to other files. For example:

        README
        ...
        test1
        test2
        test3a
        test3b
        ...

And each step is developed from the preceding one. However, not being
perfect, there have been occasions where I have found mistakes in an
earlier step that need to be corrected and propagated to the later
steps.

I have found a work flow that seems to do what I want, but there may
be a better way to do this and I would like some advice. I initialize
the DB and work on test1, that is then branched to step2 where test1
is first renamed and then modified to become test2.

At this point I create a "final" branch and through some shenanigans I
don't completely understand, I pull from master and B2 to get both
step1 and step2 into final with their full history. Then as changes
are made to step1 I can pull those into later steps. Another tricky
part is that when I get to final, I have to pull from all of the
ancestral branches simultaneously.

Anyway, here is an annotated script that illustrates this, it is also
attached without the extra commentary:

--- BEGIN GENETEST.SH ---
mkdir genetest
cd genetest
git init-db
cat - >test1 <<EOF
aaaaaaaa
BBBBBBBB
cccccccc
EOF
echo README>README
git add .
git commit -a -m "Initial checkin"

# Create step2 branch
git checkout -b step2
git mv test1 test2
git commit -a -m "Created step2"
echo dddddddd>>test2
echo "README 2">README
git commit -a -m "Added step2 changes"

# Create conglomerate branch
git checkout master
git checkout -b final

# This is where the shenanigans come in. I'm not even completely sure
#  why it works. But arrived at the first two commands through
# experimentation. I thought one or the other would suffice, but
# neither alone did.  The two together, however, bring this basically
# up to step2 status with test1 being deleted and a step2->step2
# merge indicated in the log.
git fetch -a . step2
git pull -a . step2

# So then this finds the deleted file and brings it back in from master.
FILES=$(git-diff-index --name-only --diff-filter=D master)
git add $FILES

git commit -a -m "Created final"

# Now let's go modify master and see whether or not changes
# propagate forward to both files in final.
git checkout master
cat - >test1 <<EOF
aaaaaaaa
bbbbbbbb
cccccccc
EOF
git commit -a -m "Fixed the B... line"

# Pull the changes into step2 and resolve the conflict from the
# too-close-together line changes.
git checkout step2
git pull . master || true
mv -f test1 test2
echo dddddddd>>test2
MSG=$(cat .git/MERGE_MSG)
git commit -a -m "$MSG"

# Now check that all changes come through successfully,
# the interesting thing is that all the "ancestor" branches
# MUST be pulled at together or things don't work.
git checkout final
git pull . master step2

# Check that README matches the one from step2
# and that the B... line has been corrected in both
# versions of the ancestor file.
cat README
cat test1
cat test2

# If we've gotten this far, say so.
echo SUCCEEDED
--- END GENETEST.SH ---

So can anyone explain why the mystery portions "work" and why I have
to pull from all the ancestral branches simultaneously?

Also, if you have a better solution or work flow I'm open to it.

Thanks,

--
Ron Parker

[-- Attachment #2: genetest.sh --]
[-- Type: application/x-sh, Size: 1314 bytes --]

^ permalink raw reply

* git-rev-list --date-order ?
From: Paul Mackerras @ 2006-02-16  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio,

Gitk has a -d option that tells it to reorder the commits in
decreasing order of their commit time, subject to the constraint that
parents come after all of their children.  Currently it uses
git-rev-list --header --topo-order --parents and then reorders the
commits internally.

How hard would it be to add a --date-order flag to git-rev-list to
make it order the commits in decreasing commit time order, subject to
the constraint that parents come after their children?

If we had that then I could remove another chunk of code from gitk and
make it a bit faster.

Thanks,
Paul.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox