Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] Improve git-prune -n output
From: Jakub Narebski @ 2006-10-28 14:00 UTC (permalink / raw)
  To: git
In-Reply-To: <200610270937.11368.andyparkins@gmail.com>

Andy Parkins wrote:

> prune_object() in show_only mode would previously just show the path to the
> object that would be deleted.  The path the object is stored in shouldn't be
> shown to users, they only know about sha1 identifiers so show that instead.

This allowed to 'rm -f' [some] of what git-prune -n shows.

Did anybody used that? I don't know...

> Further, the sha1 alone isn't that useful for examining what is going to be
> deleted.  This patch also adds the object type to the output, which makes it
> easy to pick out, say, the commits and use git-show to display them.

Well, you can always use git-lost-found. Perhaps it would be better to add
-n option to git-lost-found to not create refs in $GIT_DIR/lost-found/
directory.

Sorry for being late with that suggestion.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: VCS comparison table
From: Jakub Narebski @ 2006-10-28 13:53 UTC (permalink / raw)
  To: bazaar-ng; +Cc: git
In-Reply-To: <m3mz7gheoe.fsf@iny.iki.fi>

Ilpo Nyyssönen wrote:

> "J. Bruce Fields" <bfields@fieldses.org> writes:
> 
>> Documentation helps, though sometimes extensive documentation is a sign
>> of a problem--it takes a lot more documentation to explain how to manage
>> a branch in CVS than it does in any sensible system....
> 
> Usability:
> 
> I have used bzr, bk for development and git very little for following
> kernel development. I have followed this discussion quite well.
> 
> 1. It is easier to start using something you are already familiar
> with. (Just try to use Mac OS X with a Windows or Linux background.)
> 
> G: Something totally new and so no points from here. The way of using
> git is just so different from any other similar software.
> 
> B: Quite clearly gets points from this. Normal branches work quite
> like many other software, the checkout stuff works like CVS and SVN.

I find for example concept of branches in Git extremly easy to understand.
Bazaar-NG "branches" is mixture of Git branch and Git repository/clone of
repository. In bzr "branch" refers to abstract SCM concept as part of DAG of
revisions sourced from given revision/head/tip (git branch is very close to
it); yet another but distinct abstract SCM concept of branch as "your" line
of development i.e. path in the DAG of revisions started at given
revision/head/tip and ending in initial/parentless revision; the physical
representation: working area, metainformation, storage or pointer to
storage (when branches share storage forming so called bzr "repository").

About checkout: Bazaar mixes here "CVS checkout" model in the "bzr checkout"
command, and SCM concept of checking-out i.e. getting files from repository
(or branch in bzr) to working area.

On the other side breaking with traditional concepts of _centralized_ SCM
in _distributed_ SCM (and geared towards distributed usage) is IMVHO a good
idea. And breaking with the cruft of bad ideas of CVS is very good idea.

But I agree that in Git some terminology (and names of commands) could be
better. Some of it stems from BitKeeper background, some from the way Git
was created: bottom-up, from repository layout to fully (or not ;-) fledged
SCM. For example "pull" as "fetch + merge" is IIRC BitKeeper legacy, while
the fact that "merge" command is low-level (or mid-level) command fairly
poorly usable for user (which should use "pull ." for merging from local
branch).

> 2. Finding commands.
> 
> G: Quite big amount of commands, some clear, but some not so. With all
> the installed commands, it is even more confusing. What's the
> difference between fetch and pull and which one I should use? Same for
> clone and branch.
>
> B: A bit clearer I think, but the pull and merge does cause confusion. 
> Also the checkout stuff could be better shown in the command line
> help. With plugins like bzrtools the amount of command raises and
> confusion increases. Maybe better separation for plugin commands in
> the command line help?

In Git Users Survey (http://git.or.cz/gitwiki/GitSurvey) the answer "too
many commands" was most common answer to question 6. "What did you find
hardest?" in the survey (which survey was base on Mercurial survey:
http://www.selenic.com/mercurial/wiki/index.cgi/UserSurvey). It would be
perhaps better for Git to clearly divide commands between porcelanish (for
end user), admin (whole repository level) and plumbing (for use in
scripts).

But for example git(7) man page lists git commands clearly divided between
low-level commands (plumbing): manipulation commands, interrogation
commands, synching commands and high level commands (porcelain): main
commands, ancillary commands. The "git help" and "git --help" shows the
most commonly used git commands with short description of each command
("git help -a" show all commands). 

I can understand confusion between "git pull" and "git fetch"; it is
adressed in documentation. Although I think the confusion between
"bzr merge" and "bzr pull" is as great if not greater.

I don't understand the confusion between "git branch" and "git clone"
commands... unless you are confused by Bazaar-NG branch-centric approach
which mixes branch with repository.

> 3. Understanding output
> 
> G: Speaks a language of its own, hard to understand. No progress
> reported for long lasting operations.
> 
> B: Could maybe speak a bit more. Progress reporting is quite good.

Which long lasting operations lack progress bar/progress reporting?
"git clone" and "git fetch"/"git pull" both have progress report
for both "smart" git://, git+ssh:// and local protocols, and "dumb"
http://, https://, ftp://, rsync:// protocols. "git rebase" has
progress report. "git am" has progress report.

But I agree that Git tends to speak in its own jargon. But this jargon is
very clear if you are familiar with Git. BTW. some of the worst offenders
like <ent> (== <tree-ish>) is removed already from documentation.

> 4. Misc stuff
> 
> G: You have only one workspace and this forces you to use git more or
> to make several repositories. 

This is your confusion stemming from Bazaar-NG branch-centricness. In Git
working area is associated with repository, not with branch as in bzr.
Usually you have repsoitory embedded in working area, in .git directory in
top level of working area. The fact that you have only one index (but you
can specify alternate index, or switch between index files), and only one
current branch marker namely HEAD (you can switch HEAD to other branch; if
I remember correctly there is no way to specify current head other way)
makes working with multiple working areas tied to one repository more
difficult. But it is usually not necessary in Git.

In Bazaar-NG "repository" is just sharing the storage of "branches"; in Git
you can share the storage between repositories (although it is not the
default mode), or share common old history between repositories (more
common). 

> You can't just diff branchA/foo branchB/foo.

You can: either using "git diff branchA branchB -- foo" which means
difference between branches branchA and branchB limited to the differences
on branch foo (where foo can be directory name or filename), or via
"extended SHA1 reference" using "git diff branchA:foo branchB:foo" which
means compare file/directory "foo" at revision "branchA" and file/directory
"foo" at revision "branchB".

You can even diff two different _repositories_ if they are on the same local
filesystem using pasky trick described in http://git.or.cz/gitwiki/GitTips.

> You can't just open file from old branch to check 
> something while you are developing in some new branch.

You can view file from old branch via "git cat-file -p old-branch:file".

> Do I have to commit my changes before changing a branch
> in the workspace? 

You have to. But we have "git commit --amend", so if I need to do this
I usually do "git commit -m 'TEMPORARY COMMIT'" before switching to other
branch. Or you can save differences between working area and current branch
to patch file. The "git-checkpoint" proposal adresses that... in rather
heavy-handed fashion. There is also "git-stash/git-unstash" floating
somewhere in git mailing list archives.

> G: What is this git repack thing and do I have to use it? If yes, why? 
> Nobody told me that I should run it, but I did notice Linus mentioning
> it somewhere. Definetly causing harm for usability.

Hmm... perhaps "repack -a -d" should be shown in "git help" list of commonly
used commands output.

Having two separate formats in repository: loose (but compressed) and packed
(in one file, deltaified, compressed) has the following advantages:

0. Historical, it allowed for git to be released (deployed) early,
originally as fast content tracker and not full SCM, and to add features
based on how people used it and scripted it. It also gave Git design the
advantage of not being tailored/based on some storage mechanism, which
resulted in IMHO very clean design and concepts.

1. Security (together with format). It secures repository against corruption
stemming from: corruption during saving file, race condition, interruptions
during operation etc.; although it doesn' save against all possible errors.
That is what sold Keith on choosing Git as SCM for X.Org:
http://keithp.com/blog/Repository_Formats_Matter.html

2. Efficiency. The packed Git format is both AFAIK the densest repository
format from OSS SCM, and it is very fast to access any given revision.

3. Net format. It allows to use _exactly_ the same format for transmission
during clone and fetch; well with the exception that for "smart" protocols
git can send "thin" pack, with some deltas without bases. The latest work
in progress by Nicolas Pitre and others to convert thin pack to full pack
without exploding it into loose objects in between.

There quite frequently appears suggestion for SCM based on Git, or Git
porcelains (like Cogito) to automatically repack. Latest work on the option
to repack to not pack only loose objects, or repack everything, but to
repack given pack or repack with exception of some archive packs should
help with that solution.

> B: People migth misuse the revnos and so be confused when things won't
> work like they expected.

Revnos work only with very specific workflows.

> Conclusion: I would say that Bazaar is more usable than git.

Conclusion: I would say that Git is more usable than Bazaar.

^ permalink raw reply

* Re: lockfiles & fork()
From: Florian Weimer @ 2006-10-28 13:30 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.63.0610280319250.26682@wbgn013.biozentrum.uni-wuerzburg.de>

* Johannes Schindelin:

> when you setup a lockfile, and then fork(), you could easily end up with 
> atexit() kicking in to remove the lockfile, before the main process has a 
> chance to commit. (Yes, I need to hold the lock long before the fork()).
>
> Any ideas how to solve the problem?

I think you should call _exit in the child instead of exit.

^ permalink raw reply

* [PATCH] Bash completion support for aliases
From: Dennis Stosberg @ 2006-10-28 12:12 UTC (permalink / raw)
  To: git

 - Add aliases to the list of available git commands.
 - Make completion work for aliased commands.

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
---
 contrib/completion/git-completion.bash |   29 +++++++++++++++++++++++++++--
 1 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index d9cb17d..b074f4f 100755
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -101,6 +101,23 @@ __git_complete_file ()
 	esac
 }
 
+__git_aliases ()
+{
+	git repo-config --list | grep '^alias\.' \
+		| sed -e 's/^alias\.//' -e 's/=.*$//'
+}
+
+__git_aliased_command ()
+{
+	local cmdline=$(git repo-config alias.$1)
+	for word in $cmdline; do
+		if [ "${word##-*}" ]; then
+			echo $word
+			return
+		fi
+	done
+}
+
 _git_branch ()
 {
 	local cur="${COMP_WORDS[COMP_CWORD]}"
@@ -264,10 +281,18 @@ _git ()
 {
 	if [ $COMP_CWORD = 1 ]; then
 		COMPREPLY=($(compgen \
-			-W "--version $(git help -a|egrep '^ ')" \
+			-W "--version $(git help -a|egrep '^ ') \
+			    $(__git_aliases)" \
 			-- "${COMP_WORDS[COMP_CWORD]}"))
 	else
-		case "${COMP_WORDS[1]}" in
+		local command="${COMP_WORDS[1]}"
+		local expansion=$(__git_aliased_command "$command")
+
+		if [ "$expansion" ]; then
+			command="$expansion"
+		fi
+
+		case "$command" in
 		branch)      _git_branch ;;
 		cat-file)    _git_cat_file ;;
 		checkout)    _git_checkout ;;
-- 

^ permalink raw reply related

* Re: VCS comparison table
From: Jakub Narebski @ 2006-10-28 11:38 UTC (permalink / raw)
  To: Jan Hudec
  Cc: Linus Torvalds, Erik B?gfors, Matthieu Moy, bazaar-ng, Sean, git
In-Reply-To: <20061027045137.GB3179@artax.karlin.mff.cuni.cz>

Jan Hudec wrote:
> Actually bzr used to have slightly different numbering scheme not long
> ago. There was a revision-history in each branch listing the revisions
> in order in which they were commited or merged in. Some time ago it was
> changed to numbering along the leftmost parent, which was, IIRC, deemed
> simpler and a little more logical. But in the light of these arguments,
> maybe the former system was better -- it was more dependent on the
> actual location, but on the other hand it allowed (or could allow --
> IIRC there was some problem with it) to fast-forward merge while
> _locally_ keeping the meaning of old revision numbers. In fact, the
> revision-history used to be almost exactly the same as git reflog,
> except it only stored the revids, not the times.

Which is very fine if you don't modify the history (amending commits,
rewinding history to earlier point, rebasing the branch, merging branch
in and starting it anew aka. dovetail approach if I remember correctly),
and if you are not concerned with performance when fetching larger
number of commits into branch (as you have to assign number to them).

Which was perhaps why bzr changed from revnolog to leftmost/first parent
as a way to keep branch-as-path/assing revision numbers to revisions.
Which has it's own disadvantages as enumerated multiple times here
on the list.
-- 
Jakub Narebski

^ permalink raw reply

* Re: VCS comparison table
From: Ilpo Nyyssönen @ 2006-10-28 11:18 UTC (permalink / raw)
  To: bazaar-ng; +Cc: git
In-Reply-To: <20061027144656.GA32451@fieldses.org>

"J. Bruce Fields" <bfields@fieldses.org> writes:

> Documentation helps, though sometimes extensive documentation is a sign
> of a problem--it takes a lot more documentation to explain how to manage
> a branch in CVS than it does in any sensible system....

Usability:

I have used bzr, bk for development and git very little for following
kernel development. I have followed this discussion quite well.

1. It is easier to start using something you are already familiar
with. (Just try to use Mac OS X with a Windows or Linux background.)

G: Something totally new and so no points from here. The way of using
git is just so different from any other similar software.

B: Quite clearly gets points from this. Normal branches work quite
like many other software, the checkout stuff works like CVS and SVN.

2. Finding commands.

G: Quite big amount of commands, some clear, but some not so. With all
the installed commands, it is even more confusing. What's the
difference between fetch and pull and which one I should use? Same for
clone and branch.

B: A bit clearer I think, but the pull and merge does cause confusion. 
Also the checkout stuff could be better shown in the command line
help. With plugins like bzrtools the amount of command raises and
confusion increases. Maybe better separation for plugin commands in
the command line help?

3. Understanding output

G: Speaks a language of its own, hard to understand. No progress
reported for long lasting operations.

B: Could maybe speak a bit more. Progress reporting is quite good.

4. Misc stuff

G: You have only one workspace and this forces you to use git more or
to make several repositories. You can't just diff branchA/foo
branchB/foo. You can't just open file from old branch to check
something while you are developing in some new branch. Do I have to
commit my changes before changing a branch in the workspace?

G: What is this git repack thing and do I have to use it? If yes, why? 
Nobody told me that I should run it, but I did notice Linus mentioning
it somewhere. Definetly causing harm for usability.

B: People migth misuse the revnos and so be confused when things won't
work like they expected.

Conclusion: I would say that Bazaar is more usable than git.

^ permalink raw reply

* [PATCH v2] Fixes "stg goto `stg top`" to no-op & adds test
From: Ilpo Järvinen @ 2006-10-28 10:52 UTC (permalink / raw)
  To: catalin.marinas; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0610280156320.26765@kivilampi-30.cs.helsinki.fi>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2151 bytes --]

Please forgive me that didn't read the test metadata through while 
producing the last patch. This time also it should be correct...

StGIT tried to access index that is not valid when goto'ing to
the current patch. Adds also a test for it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 stgit/commands/common.py |   17 ++++++++++-------
 t/t1600-goto-top.sh      |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/stgit/commands/common.py b/stgit/commands/common.py
index 1ea6025..88b1b94 100644
--- a/stgit/commands/common.py
+++ b/stgit/commands/common.py
@@ -200,16 +200,19 @@ def pop_patches(patches, keep = False):
     """Pop the patches in the list from the stack. It is assumed that
     the patches are listed in the stack reverse order.
     """
-    p = patches[-1]
-    if len(patches) == 1:
-        print 'Popping patch "%s"...' % p,
+    if len(patches) == 0:
+        print 'nothing to push/pop'
     else:
-        print 'Popping "%s" - "%s" patches...' % (patches[0], p),
-    sys.stdout.flush()
+        p = patches[-1]
+        if len(patches) == 1:
+            print 'Popping patch "%s"...' % p,
+        else:
+            print 'Popping "%s" - "%s" patches...' % (patches[0], p),
+        sys.stdout.flush()
 
-    crt_series.pop_patch(p, keep)
+        crt_series.pop_patch(p, keep)
 
-    print 'done'
+        print 'done'
 
 def parse_patches(patch_args, patch_list):
     """Parse patch_args list for patch names in patch_list and return
diff --git a/t/t1600-goto-top.sh b/t/t1600-goto-top.sh
new file mode 100755
index 0000000..618ebc7
--- /dev/null
+++ b/t/t1600-goto-top.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+#
+# Copyright (c) 2006 Ilpo Järvinen
+#
+
+test_description='Test goto to the current patch.
+
+'
+
+. ./test-lib.sh
+
+test_expect_success \
+	'Initialize the StGIT repository' \
+	'stg init
+'
+
+test_expect_success \
+	'Create the first patch' \
+	'
+	stg new foo -m "Foo Patch" &&
+	echo foo > test &&
+	stg add test &&
+	stg refresh
+	'
+
+test_expect_success \
+	'Goto current patch' \
+	'
+	stg goto `stg top`
+	'
+
+test_done
-- 
1.4.2

^ permalink raw reply related

* Re: fetching packs and storing them as packs
From: Shawn Pearce @ 2006-10-28  8:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <20061028072146.GB14607@spearce.org>

Shawn Pearce <spearce@spearce.org> wrote:
> Why not just use create a new flag file?
> 
> Lets say that a pack X is NOT eligible to be repacked if
> "$GIT_DIR/objects/pack/pack-X.keep" exists.

Here's the `git repack -a -d` portion of that.
Thoughts?

-- >8 --
[PATCH] Only repack active packs by skipping over kept packs.

During `git repack -a -d` only repack objects which are loose or
which reside in an active (a non-kept) pack.  This allows the user
to keep large packs as-is without continuous repacking and can be
very helpful on large repositories.  It should also help us resolve
a race condition between `git repack -a -d` and the new pack store
functionality in `git-receive-pack`.

Kept packs are those which have a corresponding .keep file in
$GIT_OBJECT_DIRECTORY/pack.  That is pack-X.pack will be kept
(not repacked and not deleted) if pack-X.keep exists in the same
directory when `git repack -a -d` starts.

Currently this feature is not documented and there is no user
interface to keep an existing pack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 git-repack.sh |   48 +++++++++++++++++++++++++++++++-----------------
 1 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/git-repack.sh b/git-repack.sh
index 17e2452..fe1e2ef 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -43,13 +43,30 @@ trap 'rm -f "$PACKTMP"-*' 0 1 2 3 15
 case ",$all_into_one," in
 ,,)
 	args='--unpacked --incremental'
+	active=
 	;;
 ,t,)
-	args=
-
-	# Redundancy check in all-into-one case is trivial.
-	existing=`test -d "$PACKDIR" && cd "$PACKDIR" && \
-	    find . -type f \( -name '*.pack' -o -name '*.idx' \) -print`
+	args=--unpacked
+	active=
+	if test -d "$PACKDIR"
+	then
+		for p in `find "$PACKDIR" -type f -name '*.pack' -print`
+		do
+			n=`basename "$p" .pack`
+			d=`dirname "$p"`
+			if test -e "$d/$n.keep"
+			then
+				: keep
+			else
+				args="$args --unpacked=$p"
+				active="$active $n"
+			fi
+		done
+	fi
+	if test "X$args" = X--unpacked
+	then
+		args='--unpacked --incremental'
+	fi
 	;;
 esac
 
@@ -86,20 +103,17 @@ fi
 
 if test "$remove_redundant" = t
 then
-	# We know $existing are all redundant only when
-	# all-into-one is used.
-	if test "$all_into_one" != '' && test "$existing" != ''
+	# We know $active are all redundant.
+	if test "$active" != ''
 	then
 		sync
-		( cd "$PACKDIR" &&
-		  for e in $existing
-		  do
-			case "$e" in
-			./pack-$name.pack | ./pack-$name.idx) ;;
-			*)	rm -f $e ;;
-			esac
-		  done
-		)
+		for n in $active
+		do
+			if test "$n" != "pack-$name"
+			then
+				rm -f "$PACKDIR/$n.pack" "$PACKDIR/$n.idx"
+			fi
+		done
 	fi
 	git-prune-packed
 fi
-- 
1.4.3.3.g7d63

^ permalink raw reply related

* Re: fetching packs and storing them as packs
From: Shawn Pearce @ 2006-10-28  7:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7vwt6l9etn.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> > I would MUCH rather we just rename the index/pack file to something that 
> > git can _use_, but that "git repack -a -d" won't remove....
> 
> Two points.
> 
> The "locking" I mentioned was between receive-pack and repack -a
> -d; upload-pack (what millions people are using to read from the
> repository you are pushing into) is not affected.  So in that
> sense, we can afford to use lock without much contention.

And giving how difficult locking is to get on most filesystems I'd
just rather avoid any sort of locking whenever possible.  That's one
reason why reflog is 1 file per ref and not 1 file per repository...

> I just thought of a cute hack that does not involve renaming
> packs at all (so no need to match new-pack-X.pack with
> pack-X.idx), and Shawn's sequence actually would work, which is:

I take this above statement to mean that you answered your own
question about how my sequence is able to resolve the race condition?

> The receive-pack side:
> 
>   a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
>   b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
>   c. Write pack and index, in "inactive" state.
>   d. Move pack to $GIT_DIR/objects/pack/...
>   e. Move idx to $GIT_DIR/objects/pack...
>   f. Update refs.
>   g. Mark new pack and idx as "active".
> 
> The "repack -a -d" side:
> 
>   1. List all active packs and store in memory.
>   2. Repack only loose objects and objects contained in active packs.
>   3. Move new pack and idx into $GIT_DIR/objects/pack/...
>   4. Mark new pack and idx as "active".
>   5. Delete active packs found by step #1.
> 
> Pack-idx pair is marked "active" by "chmod u+s" the .pack file.
> During the normal operation, all .pack/.idx pair in objects/pack/
> directories are usable regardless of the setuid bit; we would
> never make .pack files executable so u+s would not otherwise
> hurt us either.  "active" probably is better read as "eligible
> for repacking".

As cool as that trick is I'm against using the file mode as a
way to indicate the status of a pack file.  For one thing not
every filesystem that Git is used on handles file modes properly.
We already have core.filemode thanks to some of those and I use
Git on at least one of those "not so friendly" filesystems...

Why not just use create a new flag file?

Lets say that a pack X is NOT eligible to be repacked if
"$GIT_DIR/objects/pack/pack-X.keep" exists.

Thus we want to have the new ".keep" file for historical packs and
incoming receive-pack between steps c and g.  In the former case
the historical pack is already "very large" and thus one additional
empty file to indicate we want to retain that pack as-is is trivial
overhead (relatively speaking); in the latter case the lifespan of
the file is relatively short and thus any overhead associated with it
on the local filesystem is free (it may never even hit the platter).

In the sequence above we create pack-X.keep between steps b and c
during receive-pack ensuring that even before the pack is usable by
a Git reader process that it can't be swept up by a `repack -a -d`
and we delete the pack-X.keep file in step g to mark it active.

Further only repack and the receive-pack side code changes: all
existing packs are automatically taken to be active while only
packs coming in from receive-pack or those marked by a human as
"historical" will be kept.

Two birds, one stone.  Thoughts?

-- 

^ permalink raw reply

* Re: [PATCH] Add git-count-packs, like git-count-objects.
From: Shawn Pearce @ 2006-10-28  6:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nicolas Pitre
In-Reply-To: <7v8xj1axlm.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> > Now that we are starting to save packs rather than unpacking into
> > loose objects its nice to have a way to list the number of current
> > packs and their total size.  This can help the user in deciding
> > when its time to run `git repack -a -d`.
> 
> Why not just do "ls -lh $GIT_OBJECT_DIR/pack/pack-*.pack"???

Because whatever we decide to use to make a pack 'active' may not
be that simple.

Whatever.  It was clearly a very tiny patch put together quickly
before dinner tonight, perhaps not worth including at this point.
Lets see what comes out of the other pack oriented discussion first.

-- 

^ permalink raw reply

* Re: Generating docu in 1.4.3.3.g01929
From: Sean @ 2006-10-28  6:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Horst H. von Brand
In-Reply-To: <7vr6wt9enk.fsf@assigned-by-dhcp.cox.net>

On Fri, 27 Oct 2006 22:45:51 -0700
Junio C Hamano <junkio@cox.net> wrote:

> Eh, do you mean bisecting asciidoc?  I am not seeing the problem
> with these on a freshly installed FC6:
> 

Yeah.. don't see the problem here either.  But assuming there
is some strange interaction with Horst's environment, bisecting
would narrow it down.  Even though I don't really think bisecting
will turn up a problem in Git, it might identify the problem in
the environment.. 

^ permalink raw reply

* Re: Generating docu in 1.4.3.3.g01929
From: Junio C Hamano @ 2006-10-28  5:45 UTC (permalink / raw)
  To: Sean; +Cc: git, Horst H. von Brand
In-Reply-To: <BAYC1-PASMTP04E0376BEE45F9A676DB03AE050@CEZ.ICE>

Sean <seanlkml@sympatico.ca> writes:

>> > Can't reproduce this here on master or on next with:
>> >  asciidoc-7.1.2-0 and xmlto-0.0.18-13.1
>> > Maybe this is an Asciidoc 8 issue, are you using it?
>> 
>> Fedora rawhide i386, with:
>> 
>>   asciidoc-7.0.2-3.fc6
>>   xmlto-0.0.18-13.1
>> 
>> Perhaps too old, not too new...
>
> Can't imagine that it's too old.  You may have to bisect to figure
> out what the culprit is. :o/

Eh, do you mean bisecting asciidoc?  I am not seeing the problem
with these on a freshly installed FC6:

Name   : asciidoc
Arch   : noarch
Version: 7.0.2
Release: 3.fc6

Name   : xmlto
Arch   : i386
Version: 0.0.18
Release: 13.1

^ permalink raw reply

* Re: fetching packs and storing them as packs
From: Junio C Hamano @ 2006-10-28  5:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Shawn Pearce
In-Reply-To: <Pine.LNX.4.64.0610272109500.3849@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Fri, 27 Oct 2006, Shawn Pearce wrote:
>> 
>> So a reader-writer lock is preferred over
>> a non-locking solution such as I posted in
>> http://article.gmane.org/gmane.comp.version-control.git/30288 ?
>> 
>> Not to mention that such a solution would also fix the -d issue
>> Linus points out above.
>
> Be very careful.
>
> There's a good reason why git doesn't use locking, and tends to use the 
> "create file exclusively and move over the old version after having tested 
> that the old version is still relevant" approach.
>
> Two _major_ issues:
>...
>
> I would MUCH rather we just rename the index/pack file to something that 
> git can _use_, but that "git repack -a -d" won't remove....

Two points.

The "locking" I mentioned was between receive-pack and repack -a
-d; upload-pack (what millions people are using to read from the
repository you are pushing into) is not affected.  So in that
sense, we can afford to use lock without much contention.

I just thought of a cute hack that does not involve renaming
packs at all (so no need to match new-pack-X.pack with
pack-X.idx), and Shawn's sequence actually would work, which is:

The receive-pack side:

  a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
  b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
  c. Write pack and index, in "inactive" state.
  d. Move pack to $GIT_DIR/objects/pack/...
  e. Move idx to $GIT_DIR/objects/pack...
  f. Update refs.
  g. Mark new pack and idx as "active".

The "repack -a -d" side:

  1. List all active packs and store in memory.
  2. Repack only loose objects and objects contained in active packs.
  3. Move new pack and idx into $GIT_DIR/objects/pack/...
  4. Mark new pack and idx as "active".
  5. Delete active packs found by step #1.

Pack-idx pair is marked "active" by "chmod u+s" the .pack file.
During the normal operation, all .pack/.idx pair in objects/pack/
directories are usable regardless of the setuid bit; we would
never make .pack files executable so u+s would not otherwise
hurt us either.  "active" probably is better read as "eligible
for repacking".

^ permalink raw reply

* Re: Generating docu in 1.4.3.3.g01929
From: Sean @ 2006-10-28  4:24 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: git
In-Reply-To: <200610272312.k9RNCo2Q002623@laptop13.inf.utfsm.cl>

On Fri, 27 Oct 2006 20:12:50 -0300
"Horst H. von Brand" <vonbrand@inf.utfsm.cl> wrote:

> > > asciidoc -b docbook -d manpage -f asciidoc.conf git-daemon.txt
> > > xmlto -m callouts.xsl man git-daemon.xml
> > > error : unterminated entity reference                
> > > error : unterminated entity reference                
> > > error : unterminated entity reference             ...
> > > error : unterminated entity reference                
> > > error : unterminated entity reference                
> > > Writing git-daemon.1 for refentry
> > 
> > Can't reproduce this here on master or on next with:
> >  asciidoc-7.1.2-0 and xmlto-0.0.18-13.1
> > Maybe this is an Asciidoc 8 issue, are you using it?
> 
> Fedora rawhide i386, with:
> 
>   asciidoc-7.0.2-3.fc6
>   xmlto-0.0.18-13.1
> 
> Perhaps too old, not too new...

Can't imagine that it's too old.  You may have to bisect to figure
out what the culprit is. :o/


^ permalink raw reply

* Re: fetching packs and storing them as packs
From: Linus Torvalds @ 2006-10-28  4:18 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git
In-Reply-To: <20061028034206.GA14044@spearce.org>

On Fri, 27 Oct 2006, Shawn Pearce wrote:
> 
> So a reader-writer lock is preferred over
> a non-locking solution such as I posted in
> http://article.gmane.org/gmane.comp.version-control.git/30288 ?
> 
> Not to mention that such a solution would also fix the -d issue
> Linus points out above.

Be very careful.

There's a good reason why git doesn't use locking, and tends to use the 
"create file exclusively and move over the old version after having tested 
that the old version is still relevant" approach.

Two _major_ issues:

 - just about any other locking algorithm simply doesn't work on some 
   filesystems. And then you're just royally screwed.

 - I want to be able to push out, regardless of whether there is somebody 
   (or millions of somebodies) reading the repository at the same time. So 
   locking is not acceptable for "normal operations" at all - at most this 
   would be a "keep a repack from interfering with another repack" kind of 
   thing.

I would MUCH rather we just rename the index/pack file to something that 
git can _use_, but that "git repack -a -d" won't remove. In other words, 
rather than locking, it would be much better to just use a naming rule: 
when we download a new pack, the new pack will be called

	new-pack-<SHA1ofobjectlist>.pack
	new-pack-<SHA1ofobjectlist>.idx

and we just make the rule that "git repack -a -d" will only ever touch 
packs that are called just "pack-*.{pack|idx}", and never anything else.

It really is that simple. Allow normal git object opens to open the 
"temporary file" naming version too (so that you can install the refs 
before the rename, and all the objects will be visible), but don't allow 
"git repack" to remove packs that are in the process of being installed.

Race removed, and no locking really needed. At most, we might need to be 
able to match up a "new-pack-*.idx" file with a "pack-*.pack" file when we 
open pack-files, simply because we can't rename two files atomically, so 
the pack-file and index file would potentially exist with "different" 
names for a short window. 

That kind of small semantic changes are _way_ better than introducing 
locking, which will inevitably have much worse error cases (not working, 
stale locks, inability to push because something is really slow, or any 
number of other problems).

^ permalink raw reply

* Re: [PATCH] Add git-count-packs, like git-count-objects.
From: Junio C Hamano @ 2006-10-28  4:11 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git, Nicolas Pitre
In-Reply-To: <20061028040056.GA14191@spearce.org>

Shawn Pearce <spearce@spearce.org> writes:

> Now that we are starting to save packs rather than unpacking into
> loose objects its nice to have a way to list the number of current
> packs and their total size.  This can help the user in deciding
> when its time to run `git repack -a -d`.

Why not just do "ls -lh $GIT_OBJECT_DIR/pack/pack-*.pack"???

^ permalink raw reply

* Re: fetching packs and storing them as packs
From: Junio C Hamano @ 2006-10-28  4:09 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git
In-Reply-To: <20061028034206.GA14044@spearce.org>

Shawn Pearce <spearce@spearce.org> writes:

> So a reader-writer lock is preferred over
> a non-locking solution such as I posted in
> http://article.gmane.org/gmane.comp.version-control.git/30288 ?

If you mean these two in your message to be "solution":

   So the receive-pack process becomes:

     a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
     b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
     c. Write pack and index.
     d. Move pack to $GIT_DIR/objects/pack/...
     e. Move index to $GIT_DIR/objects/pack...
     f. Update refs.
     g. Arrange for new pack and index to be considered active.

   And the repack -a -d process becomes:

     1. List all active packs and store in memory.
     2. Repack only loose objects and objects contained in active packs.
     3. Move new pack and idx into $GIT_DIR/objects/pack/...
     4. Arrange for new pack and idx to be considered active.
     5. Delete active packs found by step #1.

I am not so sure how it solves anything at all.

The race is about this sequence:

      - git-receive-pack is spawned from remove git-send-pack;
        it lets "index-pack --stdin --fatten" to keep the pack.

      - index-pack does its magic and moves the pack and idx
        to their final location;

      - "repack -a -d" is started by somebody else; it first
        remembers all the existing packs; it does the usual
        repacking-into-one.

      - git-receive-pack that invoked the index-pack waits for
        index-pack to finish, and then updates the refs;

      - "repack -a -d" is done repacking; removes the packs
        that existed when it checked earlier.

Now, I am not sure what your plan to "arrange for new pack and
idx to be considered active" is.  Care to explain?

There is a tricky constraints imposed on us by (arguably broken)
commit walkers in that it relies on the (arguably broken)
sha1_file.c:sha1_pack_name() interface, so naming historical
ones $GIT_OBJECT_DIR/pack/hist-X{40}.pack would not work; we
would need to fix commit walkers for that first.




^ permalink raw reply

* [PATCH] Add git-count-packs, like git-count-objects.
From: Shawn Pearce @ 2006-10-28  4:00 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Nicolas Pitre

Now that we are starting to save packs rather than unpacking into
loose objects its nice to have a way to list the number of current
packs and their total size.  This can help the user in deciding
when its time to run `git repack -a -d`.

In the future when we actually start to support historical packs
vs. active packs we probably should be reporting by default on the
number of active packs and ignoring the historical packs.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .gitignore                        |    1 +
 Documentation/git-count-packs.txt |   29 +++++++++++++++++++++++++++++
 Makefile                          |    1 +
 builtin-count-packs.c             |   29 +++++++++++++++++++++++++++++
 builtin.h                         |    1 +
 git.c                             |    1 +
 6 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/.gitignore b/.gitignore
index b670877..31be347 100644
--- a/.gitignore
+++ b/.gitignore
@@ -23,6 +23,7 @@ git-commit
 git-commit-tree
 git-convert-objects
 git-count-objects
+git-count-packs
 git-cvsexportcommit
 git-cvsimport
 git-cvsserver
diff --git a/Documentation/git-count-packs.txt b/Documentation/git-count-packs.txt
new file mode 100644
index 0000000..1420241
--- /dev/null
+++ b/Documentation/git-count-packs.txt
@@ -0,0 +1,29 @@
+git-count-packs(1)
+====================
+
+NAME
+----
+git-count-packs - Reports on packs
+
+SYNOPSIS
+--------
+'git-count-packs'
+
+DESCRIPTION
+-----------
+This counts the number of pack files and disk space consumed by
+them, to help you decide when it is a good time to repack.
+
+
+Author
+------
+Written by Shawn O. Pearce <spearce@spearce.org>
+
+Documentation
+--------------
+Documentation by Shawn O. Pearce.
+
+GIT
+---
+Part of the gitlink:git[7] suite
+
diff --git a/Makefile b/Makefile
index 2d62efb..b7fd558 100644
--- a/Makefile
+++ b/Makefile
@@ -275,6 +275,7 @@ BUILTIN_OBJS = \
 	builtin-check-ref-format.o \
 	builtin-commit-tree.o \
 	builtin-count-objects.o \
+	builtin-count-packs.o \
 	builtin-diff.o \
 	builtin-diff-files.o \
 	builtin-diff-index.o \
diff --git a/builtin-count-packs.c b/builtin-count-packs.c
new file mode 100644
index 0000000..f5a5940
--- /dev/null
+++ b/builtin-count-packs.c
@@ -0,0 +1,29 @@
+/*
+ * Builtin "git count-packs".
+ *
+ * Copyright (c) 2006 Shawn Pearce
+ */
+
+#include "cache.h"
+#include "builtin.h"
+
+static const char count_packs_usage[] = "git-count-packs";
+
+int cmd_count_packs(int ac, const char **av, const char *prefix)
+{
+	struct packed_git *p;
+	unsigned long packs = 0, pack_size = 0;
+
+	if (!ac)
+		usage(count_packs_usage);
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next) {
+		if (!p->pack_local)
+			continue;
+		packs++;
+		pack_size += p->pack_size / (1024 * 1024);
+	}
+	printf("%lu packs, %lu megabytes\n", packs, pack_size);
+	return 0;
+}
diff --git a/builtin.h b/builtin.h
index 708a2f2..410577e 100644
--- a/builtin.h
+++ b/builtin.h
@@ -22,6 +22,7 @@ extern int cmd_checkout_index(int argc,
 extern int cmd_check_ref_format(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
+extern int cmd_count_packs(int argc, const char **argv, const char *prefix);
 extern int cmd_diff_files(int argc, const char **argv, const char *prefix);
 extern int cmd_diff_index(int argc, const char **argv, const char *prefix);
 extern int cmd_diff(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 6475847..9ad0f01 100644
--- a/git.c
+++ b/git.c
@@ -227,6 +227,7 @@ static void handle_internal_command(int
 		{ "check-ref-format", cmd_check_ref_format },
 		{ "commit-tree", cmd_commit_tree, RUN_SETUP },
 		{ "count-objects", cmd_count_objects, RUN_SETUP },
+		{ "count-packs", cmd_count_packs, RUN_SETUP },
 		{ "diff", cmd_diff, RUN_SETUP | USE_PAGER },
 		{ "diff-files", cmd_diff_files, RUN_SETUP },
 		{ "diff-index", cmd_diff_index, RUN_SETUP },
-- 
1.4.3.3.g7d63

^ permalink raw reply related

* Re: fetching packs and storing them as packs
From: Shawn Pearce @ 2006-10-28  3:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7v3b99e87c.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
[snip]
> Can we perhaps have reader-writer lock on the filesystem to
> pretect the repository?  "prune" can also be made into a writer
> for that lock and "fetch-pack --keep" would be a reader for the
> lock.  That reader-writer lock would solve the issue rather
> nicely.
> 
> > That said, I think some of the "git repack -d" logic is also unnecessarily 
> > fragile.
> 
> Noted; will fix.

So a reader-writer lock is preferred over
a non-locking solution such as I posted in
http://article.gmane.org/gmane.comp.version-control.git/30288 ?

Not to mention that such a solution would also fix the -d issue
Linus points out above.

-- 

^ permalink raw reply

* What's in git.git "next"
From: Junio C Hamano @ 2006-10-28  2:09 UTC (permalink / raw)
  To: git

I've fixed up merge-recursive not to warn "working file will be
clobbered" when unneeded, and merged it to "next" tonight.  This
is a usability fix but if done incorrectly we can break a safety
valve.

The "next" version changes behaviour from the traditional one
when (1) paths that are untracked in our branch exists in the
common ancestor and the other branch we merge into our branch,
(2) the other branch did not make any changes to these paths,
and (3) the working tree has these paths as untracked files.

Under this condition, 3-way merge decides the path should not
exist in the result.  This has not been changed (and shouldn't
be).  But what is being fixed is what "should not exist" means.
We used to say "we have that path in our working tree, which
will be lost by the merge, so we won't merge".  Which was
perhaps safer but was inconvenient.  The corrected behaviour
should be "the path is not tracked in our branch, and the result
of the merge won't have it tracked either, and we will leave
those untracked working tree files as they were".

I added a handful tests to t6022 to catch potential breakages,
and the code still passes them, but that does not mean it is
perfect.  If the program refuses to proceed when it can, you
found a safer breakage I do not worry too much about.  If the
program overwrites or loses an untracked working tree file as a
result of the merge, then that means the updated merge-recursive
relaxed the check too much.

So please handle this with a bit of extra care than usual; I'd
appreciate extra sets of eyeballs to sanity check.

^ permalink raw reply

* Re: lockfiles & fork()
From: Linus Torvalds @ 2006-10-28  1:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0610280319250.26682@wbgn013.biozentrum.uni-wuerzburg.de>



On Sat, 28 Oct 2006, Johannes Schindelin wrote:
> 
> when you setup a lockfile, and then fork(), you could easily end up with 
> atexit() kicking in to remove the lockfile, before the main process has a 
> chance to commit. (Yes, I need to hold the lock long before the fork()).

The easiest way to handle this would be for the lockfile logic to just 
save the pid associated with each lockfile... And for the atexit function 
to just ignore it if the pid doesn't match. 


^ permalink raw reply

* Re: lockfiles & fork()
From: Junio C Hamano @ 2006-10-28  1:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0610280319250.26682@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> when you setup a lockfile, and then fork(), you could easily end up with 
> atexit() kicking in to remove the lockfile, before the main process has a 
> chance to commit. (Yes, I need to hold the lock long before the fork()).
>
> Any ideas how to solve the problem?

Depends on the arrangement between the parent and child
regarding who is responsible for unlocking.  If child expects to
work under the safety of the lock the parent prepared, and also
relies on the parent to wait for it and then unlock to ensure
that safety for the child, then I think the problem is simple.

How about defining a function in lockfile.c to assign NULL to
lock_file_list and call that immediately after fork() in the
child?

^ permalink raw reply

* lockfiles & fork()
From: Johannes Schindelin @ 2006-10-28  1:21 UTC (permalink / raw)
  To: git

Hi,

when you setup a lockfile, and then fork(), you could easily end up with 
atexit() kicking in to remove the lockfile, before the main process has a 
chance to commit. (Yes, I need to hold the lock long before the fork()).

Any ideas how to solve the problem?

Ciao,
Dscho

^ permalink raw reply

* Re: Generating docu in 1.4.3.3.g01929
From: Horst H. von Brand @ 2006-10-27 23:12 UTC (permalink / raw)
  To: Sean; +Cc: Horst H. von Brand, git
In-Reply-To: <20061027154433.da9b29d7.seanlkml@sympatico.ca>

Sean <seanlkml@sympatico.ca> wrote:

> On Fri, 27 Oct 2006 14:26:53 -0300
> "Horst H. von Brand" <vonbrand@inf.utfsm.cl> wrote:
> 
> > I'm getting lots of these after today's pull:
> > 
> > asciidoc -b docbook -d manpage -f asciidoc.conf git-daemon.txt
> > xmlto -m callouts.xsl man git-daemon.xml
> > error : unterminated entity reference                
> > error : unterminated entity reference                
> > error : unterminated entity reference             ...
> > error : unterminated entity reference                
> > error : unterminated entity reference                
> > Writing git-daemon.1 for refentry
> 
> Can't reproduce this here on master or on next with:
>  asciidoc-7.1.2-0 and xmlto-0.0.18-13.1
> Maybe this is an Asciidoc 8 issue, are you using it?

Fedora rawhide i386, with:

  asciidoc-7.0.2-3.fc6
  xmlto-0.0.18-13.1

Perhaps too old, not too new...
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply

* Re: [PATCH 1/3] make index-pâck able to complete thin packs
From: Nicolas Pitre @ 2006-10-28  0:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmz7hcs0o.fsf@assigned-by-dhcp.cox.net>

On Fri, 27 Oct 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > +static void fix_unresolved_deltas(int nr_unresolved)
> > +{
[...]
> > +		if (objects[d->obj_no].real_type != OBJ_REF_DELTA)
> > +			continue;
> 
> Just for the sake of my own sanity checking,...
> 
> Is it because base objects of OFS_DELTA _must_ exist in the same
> pack (even when --thin) that we do not have to worry about them
> in this function?

Indeed.  Otherwise it would be impossible to determine the offset to the 
base object if that base object is not in the pack.

That doesn't mean there isn't any OFS_DELTA left to resolve though.  
But this function is only about adding missing objects to the pack and 
the deltas that need them must be REF_DELTA objects.

Once a REF_DELTA has been resolved with an external object, it then 
constitute a possible new base for remaining OFS_DELTA or 
REF_DELTA objects, and that is found recursively in resolve_delta().

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox